Workflow: Workflow Management¶
Introduction¶
The components in Qlib Framework are designed in a loosely-coupled way. Users could build their own Quant research workflow with these components like Example.
Besides, Qlib
provides more user-friendly interfaces named qrun
to automatically run the whole workflow defined by configuration. Running the whole workflow is called an execution.
With qrun
, user can easily start an execution, which includes the following steps:
- Data
- Loading
- Processing
- Slicing
- Model
- Training and inference
- Saving & loading
- Evaluation
- Forecast signal analysis
- Backtest
For each execution, Qlib
has a complete system to tracking all the information as well as artifacts generated during training, inference and evaluation phase. For more information about how Qlib
handles this, please refer to the related document: Recorder: Experiment Management.
Complete Example¶
Before getting into details, here is a complete example of qrun
, which defines the workflow in typical Quant research.
Below is a typical config file of qrun
.
qlib_init:
provider_uri: "~/.qlib/qlib_data/cn_data"
region: cn
market: &market csi300
benchmark: &benchmark SH000300
data_handler_config: &data_handler_config
start_time: 2008-01-01
end_time: 2020-08-01
fit_start_time: 2008-01-01
fit_end_time: 2014-12-31
instruments: *market
port_analysis_config: &port_analysis_config
strategy:
class: TopkDropoutStrategy
module_path: qlib.contrib.strategy.strategy
kwargs:
topk: 50
n_drop: 5
backtest:
verbose: False
limit_threshold: 0.095
account: 100000000
benchmark: *benchmark
deal_price: close
open_cost: 0.0005
close_cost: 0.0015
min_cost: 5
task:
model:
class: LGBModel
module_path: qlib.contrib.model.gbdt
kwargs:
loss: mse
colsample_bytree: 0.8879
learning_rate: 0.0421
subsample: 0.8789
lambda_l1: 205.6999
lambda_l2: 580.9768
max_depth: 8
num_leaves: 210
num_threads: 20
dataset:
class: DatasetH
module_path: qlib.data.dataset
kwargs:
handler:
class: Alpha158
module_path: qlib.contrib.data.handler
kwargs: *data_handler_config
segments:
train: [2008-01-01, 2014-12-31]
valid: [2015-01-01, 2016-12-31]
test: [2017-01-01, 2020-08-01]
record:
- class: SignalRecord
module_path: qlib.workflow.record_temp
kwargs: {}
- class: PortAnaRecord
module_path: qlib.workflow.record_temp
kwargs:
config: *port_analysis_config
After saving the config into configuration.yaml, users could start the workflow and test their ideas with a single command below.
qrun configuration.yaml
Note
qrun will be placed in your $PATH directory when installing Qlib
.
Note
The symbol & in yaml file stands for an anchor of a field, which is useful when another fields include this parameter as part of the value. Taking the configuration file above as an example, users can directly change the value of market and benchmark without traversing the entire configuration file.
Configuration File¶
Let’s get into details of qrun
in this section.
Before using qrun
, users need to prepare a configuration file. The following content shows how to prepare each part of the configuration file.
Qlib Init Section¶
At first, the configuration file needs to contain several basic parameters which will be used for qlib initialization.
provider_uri: "~/.qlib/qlib_data/cn_data"
region: cn
The meaning of each field is as follows:
- provider_uri
Type: str. The URI of the Qlib data. For example, it could be the location where the data loaded by
get_data.py
are stored.
- region
- If region == “us”,
Qlib
will be initialized in US-stock mode. - If region == “cn”,
Qlib
will be initialized in china-stock mode.
Note
The value of region should be aligned with the data stored in provider_uri.
- If region == “us”,
Task Section¶
The task field in the configuration corresponds to a task, which contains the parameters of three different subsections: Model, Dataset and Record.
Model Section¶
In the task field, the model section describes the parameters of the model to be used for training and inference. For more information about the base Model
class, please refer to Qlib Model.
model:
class: LGBModel
module_path: qlib.contrib.model.gbdt
kwargs:
loss: mse
colsample_bytree: 0.8879
learning_rate: 0.0421
subsample: 0.8789
lambda_l1: 205.6999
lambda_l2: 580.9768
max_depth: 8
num_leaves: 210
num_threads: 20
The meaning of each field is as follows:
- class
- Type: str. The name for the model class.
- module_path
- Type: str. The path for the model in qlib.
- kwargs
- The keywords arguments for the model. Please refer to the specific model implementation for more information: models.
Note
Qlib
provides a util named: init_instance_by_config
to initialize any class inside Qlib
with the configuration includes the fields: class, module_path and kwargs.
Dataset Section¶
The dataset field describes the parameters for the Dataset
module in Qlib
as well those for the module DataHandler
. For more information about the Dataset
module, please refer to Qlib Model.
The keywords arguments configuration of the DataHandler
is as follows:
data_handler_config: &data_handler_config
start_time: 2008-01-01
end_time: 2020-08-01
fit_start_time: 2008-01-01
fit_end_time: 2014-12-31
instruments: *market
Users can refer to the document of DataHandler for more information about the meaning of each field in the configuration.
Here is the configuration for the Dataset
module which will take care of data preprossing and slicing during the training and testing phase.
dataset:
class: DatasetH
module_path: qlib.data.dataset
kwargs:
handler:
class: Alpha158
module_path: qlib.contrib.data.handler
kwargs: *data_handler_config
segments:
train: [2008-01-01, 2014-12-31]
valid: [2015-01-01, 2016-12-31]
test: [2017-01-01, 2020-08-01]
Record Section¶
The record field is about the parameters the Record
module in Qlib
. Record
is responsible for tracking training process and results such as information Coefficient (IC) and backtest in a standard format.
The following script is the configuration of backtest and the strategy used in backtest:
port_analysis_config: &port_analysis_config
strategy:
class: TopkDropoutStrategy
module_path: qlib.contrib.strategy.strategy
kwargs:
topk: 50
n_drop: 5
backtest:
verbose: False
limit_threshold: 0.095
account: 100000000
benchmark: *benchmark
deal_price: close
open_cost: 0.0005
close_cost: 0.0015
min_cost: 5
For more information about the meaning of each field in configuration of strategy and backtest, users can look up the documents: Strategy and Backtest.
Here is the configuration details of different Record Template such as SignalRecord
and PortAnaRecord
:
record:
- class: SignalRecord
module_path: qlib.workflow.record_temp
kwargs: {}
- class: PortAnaRecord
module_path: qlib.workflow.record_temp
kwargs:
config: *port_analysis_config
For more information about the Record
module in Qlib
, user can refer to the related document: Record.