Custom Model Integration

Introduction

Qlib’s Model Zoo includes models such as LightGBM, MLP, LSTM, etc.. These models are examples of Forecast Model. In addition to the default models Qlib provide, users can integrate their own custom models into Qlib.

Users can integrate their own custom models according to the following steps.

Define a custom model class, which should be a subclass of the qlib.model.base.Model.
Write a configuration file that describes the path and parameters of the custom model.
Test the custom model.

Custom Model Class

The Custom models need to inherit qlib.model.base.Model and override the methods in it.

Override the __init__ method
- Qlib passes the initialized parameters to the __init__ method.
- The hyperparameters of model in the configuration must be consistent with those defined in the __init__ method.
- Code Example: In the following example, the hyperparameters of model in the configuration file should contain parameters such as loss:mse.
  def __init__(self, loss='mse', **kwargs): if loss not in {'mse', 'binary'}: raise NotImplementedError self._scorer = mean_squared_error if loss == 'mse' else roc_auc_score self._params.update(objective=loss, **kwargs) self._model = None

Override the fit method

Qlib calls the fit method to train the model.
The parameters must include training feature dataset, which is designed in the interface.
The parameters could include some optional parameters with default values, such as num_boost_round = 1000 for GBDT.

Code Example: In the following example, num_boost_round = 1000 is an optional parameter.

def fit(self, dataset: DatasetH, num_boost_round = 1000, **kwargs):

    # prepare dataset for lgb training and evaluation
    df_train, df_valid = dataset.prepare(
        ["train", "valid"], col_set=["feature", "label"], data_key=DataHandlerLP.DK_L
    )
    x_train, y_train = df_train["feature"], df_train["label"]
    x_valid, y_valid = df_valid["feature"], df_valid["label"]

    # Lightgbm need 1D array as its label
    if y_train.values.ndim == 2 and y_train.values.shape[1] == 1:
        y_train, y_valid = np.squeeze(y_train.values), np.squeeze(y_valid.values)
    else:
        raise ValueError("LightGBM doesn't support multi-label training")

    dtrain = lgb.Dataset(x_train.values, label=y_train)
    dvalid = lgb.Dataset(x_valid.values, label=y_valid)

    # fit the model
    self.model = lgb.train(
        self.params,
        dtrain,
        num_boost_round=num_boost_round,
        valid_sets=[dtrain, dvalid],
        valid_names=["train", "valid"],
        early_stopping_rounds=early_stopping_rounds,
        verbose_eval=verbose_eval,
        evals_result=evals_result,
        **kwargs
    )

Override the predict method

The parameters must include the parameter dataset, which will be used to get the test dataset.
Return the prediction score.
Please refer to Model API for the parameter types of the fit method.

Code Example: In the following example, users need to use LightGBM to predict the label(such as preds) of test data x_test and return it.

def predict(self, dataset: DatasetH, **kwargs)-> pandas.Series:
    if self.model is None:
        raise ValueError("model is not fitted yet!")
    x_test = dataset.prepare("test", col_set="feature", data_key=DataHandlerLP.DK_I)
    return pd.Series(self.model.predict(x_test.values), index=x_test.index)

Override the finetune method (Optional)

This method is optional to the users. When users want to use this method on their own models, they should inherit the ModelFT base class, which includes the interface of finetune.
The parameters must include the parameter dataset.

Code Example: In the following example, users will use LightGBM as the model and finetune it.

def finetune(self, dataset: DatasetH, num_boost_round=10, verbose_eval=20):
    # Based on existing model and finetune by train more rounds
    dtrain, _ = self._prepare_data(dataset)
    self.model = lgb.train(
        self.params,
        dtrain,
        num_boost_round=num_boost_round,
        init_model=self.model,
        valid_sets=[dtrain],
        valid_names=["train"],
        verbose_eval=verbose_eval,
    )

Configuration File

The configuration file is described in detail in the Workflow document. In order to integrate the custom model into Qlib, users need to modify the “model” field in the configuration file. The configuration describes which models to use and how we can initialize it.

Example: The following example describes the model field of configuration file about the custom lightgbm model mentioned above, where module_path is the module path, class is the class name, and args is the hyperparameter passed into the __init__ method. All parameters in the field is passed to self._params by **kwargs in __init__ except loss = mse.

model:
    class: LGBModel
    module_path: qlib.contrib.model.gbdt
    args:
        loss: mse
        colsample_bytree: 0.8879
        learning_rate: 0.0421
        subsample: 0.8789
        lambda_l1: 205.6999
        lambda_l2: 580.9768
        max_depth: 8
        num_leaves: 210
        num_threads: 20

Users could find configuration file of the baselines of the Model in examples/benchmarks. All the configurations of different models are listed under the corresponding model folder.

Model Testing

Assuming that the configuration file is examples/benchmarks/LightGBM/workflow_config_lightgbm.yaml, users can run the following command to test the custom model:

cd examples  # Avoid running program under the directory contains `qlib`
qrun benchmarks/LightGBM/workflow_config_lightgbm.yaml

Note

qrun is a built-in command of Qlib.

Also, Model can also be tested as a single module. An example has been given in examples/workflow_by_code.ipynb.

Reference

To know more about Forecast Model, please refer to Forecast Model: Model Training & Prediction and Model API.