# Workflow: Workflow Management¶

## Introduction¶

The components in Qlib Framework are designed in a loosely-coupled way. Users could build their own Quant research workflow with these components like Example.

Besides, Qlib provides more user-friendly interfaces named qrun to automatically run the whole workflow defined by configuration. Running the whole workflow is called an execution. With qrun, user can easily start an execution, which includes the following steps:

• Data
• Processing
• Slicing
• Model
• Training and inference
• Evaluation
• Forecast signal analysis
• Backtest

For each execution, Qlib has a complete system to tracking all the information as well as artifacts generated during training, inference and evaluation phase. For more information about how Qlib handles this, please refer to the related document: Recorder: Experiment Management.

## Complete Example¶

Before getting into details, here is a complete example of qrun, which defines the workflow in typical Quant research. Below is a typical config file of qrun.

qlib_init:
provider_uri: "~/.qlib/qlib_data/cn_data"
region: cn
market: &market csi300
benchmark: &benchmark SH000300
data_handler_config: &data_handler_config
start_time: 2008-01-01
end_time: 2020-08-01
fit_start_time: 2008-01-01
fit_end_time: 2014-12-31
instruments: *market
port_analysis_config: &port_analysis_config
strategy:
class: TopkDropoutStrategy
module_path: qlib.contrib.strategy.strategy
kwargs:
topk: 50
n_drop: 5
signal:
- <MODEL>
- <DATASET>
backtest:
limit_threshold: 0.095
account: 100000000
benchmark: *benchmark
deal_price: close
open_cost: 0.0005
close_cost: 0.0015
min_cost: 5
model:
class: LGBModel
module_path: qlib.contrib.model.gbdt
kwargs:
loss: mse
colsample_bytree: 0.8879
learning_rate: 0.0421
subsample: 0.8789
lambda_l1: 205.6999
lambda_l2: 580.9768
max_depth: 8
num_leaves: 210
dataset:
class: DatasetH
module_path: qlib.data.dataset
kwargs:
handler:
class: Alpha158
module_path: qlib.contrib.data.handler
kwargs: *data_handler_config
segments:
train: [2008-01-01, 2014-12-31]
valid: [2015-01-01, 2016-12-31]
test: [2017-01-01, 2020-08-01]
record:
- class: SignalRecord
module_path: qlib.workflow.record_temp
kwargs: {}
- class: PortAnaRecord
module_path: qlib.workflow.record_temp
kwargs:
config: *port_analysis_config


After saving the config into configuration.yaml, users could start the workflow and test their ideas with a single command below.

qrun configuration.yaml


If users want to use qrun under debug mode, please use the following command:

python -m pdb qlib/workflow/cli.py examples/benchmarks/LightGBM/workflow_config_lightgbm_Alpha158.yaml


Note

qrun will be placed in your \$PATH directory when installing Qlib.

Note

The symbol & in yaml file stands for an anchor of a field, which is useful when another fields include this parameter as part of the value. Taking the configuration file above as an example, users can directly change the value of market and benchmark without traversing the entire configuration file.

## Configuration File¶

Let’s get into details of qrun in this section. Before using qrun, users need to prepare a configuration file. The following content shows how to prepare each part of the configuration file.

The design logic of the configuration file is very simple. It predefines fixed workflows and provide this yaml interface to users to define how to initialize each component. It follow the design of init_instance_by_config . It defines the initialization of each component of Qlib, which typically include the class and the initialization arguments.

For example, the following yaml and code are equivalent.

model:
class: LGBModel
module_path: qlib.contrib.model.gbdt
kwargs:
loss: mse
colsample_bytree: 0.8879
learning_rate: 0.0421
subsample: 0.8789
lambda_l1: 205.6999
lambda_l2: 580.9768
max_depth: 8
num_leaves: 210

from qlib.contrib.model.gbdt import LGBModel
kwargs = {
"loss": "mse" ,
"colsample_bytree": 0.8879,
"learning_rate": 0.0421,
"subsample": 0.8789,
"lambda_l1": 205.6999,
"lambda_l2": 580.9768,
"max_depth": 8,
"num_leaves": 210,
}
LGBModel(kwargs)


### Qlib Init Section¶

At first, the configuration file needs to contain several basic parameters which will be used for qlib initialization.

provider_uri: "~/.qlib/qlib_data/cn_data"
region: cn


The meaning of each field is as follows:

• provider_uri

Type: str. The URI of the Qlib data. For example, it could be the location where the data loaded by get_data.py are stored.

• region
• If region == “us”, Qlib will be initialized in US-stock mode.
• If region == “cn”, Qlib will be initialized in China-stock mode.

Note

The value of region should be aligned with the data stored in provider_uri.

The task field in the configuration corresponds to a task, which contains the parameters of three different subsections: Model, Dataset and Record.

#### Model Section¶

In the task field, the model section describes the parameters of the model to be used for training and inference. For more information about the base Model class, please refer to Qlib Model.

model:
class: LGBModel
module_path: qlib.contrib.model.gbdt
kwargs:
loss: mse
colsample_bytree: 0.8879
learning_rate: 0.0421
subsample: 0.8789
lambda_l1: 205.6999
lambda_l2: 580.9768
max_depth: 8
num_leaves: 210


The meaning of each field is as follows:

• class
Type: str. The name for the model class.
• module_path
Type: str. The path for the model in qlib.
• kwargs
The keywords arguments for the model. Please refer to the specific model implementation for more information: models.

Note

Qlib provides a util named: init_instance_by_config to initialize any class inside Qlib with the configuration includes the fields: class, module_path and kwargs.

#### Dataset Section¶

The dataset field describes the parameters for the Dataset module in Qlib as well those for the module DataHandler. For more information about the Dataset module, please refer to Qlib Model.

The keywords arguments configuration of the DataHandler is as follows:

data_handler_config: &data_handler_config
start_time: 2008-01-01
end_time: 2020-08-01
fit_start_time: 2008-01-01
fit_end_time: 2014-12-31
instruments: *market


Users can refer to the document of DataHandler for more information about the meaning of each field in the configuration.

Here is the configuration for the Dataset module which will take care of data preprossing and slicing during the training and testing phase.

dataset:
class: DatasetH
module_path: qlib.data.dataset
kwargs:
handler:
class: Alpha158
module_path: qlib.contrib.data.handler
kwargs: *data_handler_config
segments:
train: [2008-01-01, 2014-12-31]
valid: [2015-01-01, 2016-12-31]
test: [2017-01-01, 2020-08-01]


#### Record Section¶

The record field is about the parameters the Record module in Qlib. Record is responsible for tracking training process and results such as information Coefficient (IC) and backtest in a standard format.

The following script is the configuration of backtest and the strategy used in backtest:

port_analysis_config: &port_analysis_config
strategy:
class: TopkDropoutStrategy
module_path: qlib.contrib.strategy.strategy
kwargs:
topk: 50
n_drop: 5
signal:
- <MODEL>
- <DATASET>
backtest:
limit_threshold: 0.095
account: 100000000
benchmark: *benchmark
deal_price: close
open_cost: 0.0005
close_cost: 0.0015
min_cost: 5


For more information about the meaning of each field in configuration of strategy and backtest, users can look up the documents: Strategy and Backtest.

Here is the configuration details of different Record Template such as SignalRecord and PortAnaRecord:

record:
- class: SignalRecord
module_path: qlib.workflow.record_temp
kwargs: {}
- class: PortAnaRecord
module_path: qlib.workflow.record_temp
kwargs:
config: *port_analysis_config


For more information about the Record module in Qlib, user can refer to the related document: Record.