Workflow: Workflow Management

Introduction

The components in Qlib Framework are designed in a loosely-coupled way. Users could build their own Quant research workflow with these components like Example.

Besides, Qlib provides more user-friendly interfaces named qrun to automatically run the whole workflow defined by configuration. Running the whole workflow is called an execution. With qrun, user can easily start an execution, which includes the following steps:

  • Data
    • Loading
    • Processing
    • Slicing
  • Model
    • Training and inference
    • Saving & loading
  • Evaluation
    • Forecast signal analysis
    • Backtest

For each execution, Qlib has a complete system to tracking all the information as well as artifacts generated during training, inference and evaluation phase. For more information about how Qlib handles this, please refer to the related document: Recorder: Experiment Management.

Complete Example

Before getting into details, here is a complete example of qrun, which defines the workflow in typical Quant research. Below is a typical config file of qrun.

qlib_init:
    provider_uri: "~/.qlib/qlib_data/cn_data"
    region: cn
market: &market csi300
benchmark: &benchmark SH000300
data_handler_config: &data_handler_config
    start_time: 2008-01-01
    end_time: 2020-08-01
    fit_start_time: 2008-01-01
    fit_end_time: 2014-12-31
    instruments: *market
port_analysis_config: &port_analysis_config
    strategy:
        class: TopkDropoutStrategy
        module_path: qlib.contrib.strategy.strategy
        kwargs:
            topk: 50
            n_drop: 5
            signal: <PRED>
    backtest:
        limit_threshold: 0.095
        account: 100000000
        benchmark: *benchmark
        deal_price: close
        open_cost: 0.0005
        close_cost: 0.0015
        min_cost: 5
task:
    model:
        class: LGBModel
        module_path: qlib.contrib.model.gbdt
        kwargs:
            loss: mse
            colsample_bytree: 0.8879
            learning_rate: 0.0421
            subsample: 0.8789
            lambda_l1: 205.6999
            lambda_l2: 580.9768
            max_depth: 8
            num_leaves: 210
            num_threads: 20
    dataset:
        class: DatasetH
        module_path: qlib.data.dataset
        kwargs:
            handler:
                class: Alpha158
                module_path: qlib.contrib.data.handler
                kwargs: *data_handler_config
            segments:
                train: [2008-01-01, 2014-12-31]
                valid: [2015-01-01, 2016-12-31]
                test: [2017-01-01, 2020-08-01]
    record:
        - class: SignalRecord
          module_path: qlib.workflow.record_temp
          kwargs: {}
        - class: PortAnaRecord
          module_path: qlib.workflow.record_temp
          kwargs:
              config: *port_analysis_config

After saving the config into configuration.yaml, users could start the workflow and test their ideas with a single command below.

qrun configuration.yaml

If users want to use qrun under debug mode, please use the following command:

python -m pdb qlib/workflow/cli.py examples/benchmarks/LightGBM/workflow_config_lightgbm_Alpha158.yaml

Note

qrun will be placed in your $PATH directory when installing Qlib.

Note

The symbol & in yaml file stands for an anchor of a field, which is useful when another fields include this parameter as part of the value. Taking the configuration file above as an example, users can directly change the value of market and benchmark without traversing the entire configuration file.

Configuration File

Let’s get into details of qrun in this section. Before using qrun, users need to prepare a configuration file. The following content shows how to prepare each part of the configuration file.

The design logic of the configuration file is very simple. It predefines fixed workflows and provide this yaml interface to users to define how to initialize each component. It follow the design of init_instance_by_config . It defines the initialization of each component of Qlib, which typically include the class and the initialization arguments.

For example, the following yaml and code are equivalent.

model:
    class: LGBModel
    module_path: qlib.contrib.model.gbdt
    kwargs:
        loss: mse
        colsample_bytree: 0.8879
        learning_rate: 0.0421
        subsample: 0.8789
        lambda_l1: 205.6999
        lambda_l2: 580.9768
        max_depth: 8
        num_leaves: 210
        num_threads: 20
from qlib.contrib.model.gbdt import LGBModel
kwargs = {
    "loss": "mse" ,
    "colsample_bytree": 0.8879,
    "learning_rate": 0.0421,
    "subsample": 0.8789,
    "lambda_l1": 205.6999,
    "lambda_l2": 580.9768,
    "max_depth": 8,
    "num_leaves": 210,
    "num_threads": 20,
}
LGBModel(kwargs)

Qlib Init Section

At first, the configuration file needs to contain several basic parameters which will be used for qlib initialization.

provider_uri: "~/.qlib/qlib_data/cn_data"
region: cn

The meaning of each field is as follows:

  • provider_uri

    Type: str. The URI of the Qlib data. For example, it could be the location where the data loaded by get_data.py are stored.

  • region
    • If region == “us”, Qlib will be initialized in US-stock mode.
    • If region == “cn”, Qlib will be initialized in China-stock mode.

    Note

    The value of region should be aligned with the data stored in provider_uri.

Task Section

The task field in the configuration corresponds to a task, which contains the parameters of three different subsections: Model, Dataset and Record.

Model Section

In the task field, the model section describes the parameters of the model to be used for training and inference. For more information about the base Model class, please refer to Qlib Model.

model:
    class: LGBModel
    module_path: qlib.contrib.model.gbdt
    kwargs:
        loss: mse
        colsample_bytree: 0.8879
        learning_rate: 0.0421
        subsample: 0.8789
        lambda_l1: 205.6999
        lambda_l2: 580.9768
        max_depth: 8
        num_leaves: 210
        num_threads: 20

The meaning of each field is as follows:

  • class
    Type: str. The name for the model class.
  • module_path
    Type: str. The path for the model in qlib.
  • kwargs
    The keywords arguments for the model. Please refer to the specific model implementation for more information: models.

Note

Qlib provides a util named: init_instance_by_config to initialize any class inside Qlib with the configuration includes the fields: class, module_path and kwargs.

Dataset Section

The dataset field describes the parameters for the Dataset module in Qlib as well those for the module DataHandler. For more information about the Dataset module, please refer to Qlib Data.

The keywords arguments configuration of the DataHandler is as follows:

data_handler_config: &data_handler_config
    start_time: 2008-01-01
    end_time: 2020-08-01
    fit_start_time: 2008-01-01
    fit_end_time: 2014-12-31
    instruments: *market

Users can refer to the document of DataHandler for more information about the meaning of each field in the configuration.

Here is the configuration for the Dataset module which will take care of data preprocessing and slicing during the training and testing phase.

dataset:
    class: DatasetH
    module_path: qlib.data.dataset
    kwargs:
        handler:
            class: Alpha158
            module_path: qlib.contrib.data.handler
            kwargs: *data_handler_config
        segments:
            train: [2008-01-01, 2014-12-31]
            valid: [2015-01-01, 2016-12-31]
            test: [2017-01-01, 2020-08-01]

Record Section

The record field is about the parameters the Record module in Qlib. Record is responsible for tracking training process and results such as information Coefficient (IC) and backtest in a standard format.

The following script is the configuration of backtest and the strategy used in backtest:

port_analysis_config: &port_analysis_config
    strategy:
        class: TopkDropoutStrategy
        module_path: qlib.contrib.strategy.strategy
        kwargs:
            topk: 50
            n_drop: 5
            signal: <PRED>
    backtest:
        limit_threshold: 0.095
        account: 100000000
        benchmark: *benchmark
        deal_price: close
        open_cost: 0.0005
        close_cost: 0.0015
        min_cost: 5

For more information about the meaning of each field in configuration of strategy and backtest, users can look up the documents: Strategy and Backtest.

Here is the configuration details of different Record Template such as SignalRecord and PortAnaRecord:

record:
    - class: SignalRecord
      module_path: qlib.workflow.record_temp
      kwargs: {}
    - class: PortAnaRecord
      module_path: qlib.workflow.record_temp
      kwargs:
        config: *port_analysis_config

For more information about the Record module in Qlib, user can refer to the related document: Record.