Custom Model Integration¶

Introduction¶

Qlib provides lightGBM and Dnn model as the baseline of Interday Model. In addition to the default model, users can integrate their own custom models into Qlib.

Users can integrate their own custom models according to the following steps.

Define a custom model class, which should be a subclass of the qlib.contrib.model.base.Model.
Write a configuration file that describes the path and parameters of the custom model.
Test the custom model.

Custom Model Class¶

The Custom models need to inherit qlib.contrib.model.base.Model and override the methods in it.

Override the __init__ method

Qlib passes the initialized parameters to the __init__ method.
The parameter must be consistent with the hyperparameters in the configuration file.
Code Example: In the following example, the hyperparameter filed of the configuration file should contain parameters such as loss:mse.

def __init__(self, loss='mse', **kwargs):
    if loss not in {'mse', 'binary'}:
        raise NotImplementedError
    self._scorer = mean_squared_error if loss == 'mse' else roc_auc_score
    self._params.update(objective=loss, **kwargs)
    self._model = None

Override the fit method

Qlib calls the fit method to train the model
The parameters must include training feature x_train, training label y_train, test feature x_valid, test label y_valid at least.
The parameters could include some optional parameters with default values, such as train weight w_train, test weight w_valid and num_boost_round = 1000.
Code Example: In the following example, num_boost_round = 1000 is an optional parameter.

def fit(self, x_train:pd.DataFrame, y_train:pd.DataFrame, x_valid:pd.DataFrame, y_valid:pd.DataFrame,
    w_train:pd.DataFrame = None, w_valid:pd.DataFrame = None, num_boost_round = 1000, **kwargs):

    # Lightgbm need 1D array as its label
    if y_train.values.ndim == 2 and y_train.values.shape[1] == 1:
        y_train_1d, y_valid_1d = np.squeeze(y_train.values), np.squeeze(y_valid.values)
    else:
        raise ValueError('LightGBM doesn\'t support multi-label training')

    w_train_weight = None if w_train is None else w_train.values
    w_valid_weight = None if w_valid is None else w_valid.values

    dtrain = lgb.Dataset(x_train.values, label=y_train_1d, weight=w_train_weight)
    dvalid = lgb.Dataset(x_valid.values, label=y_valid_1d, weight=w_valid_weight)
    self._model = lgb.train(
        self._params,
        dtrain,
        num_boost_round=num_boost_round,
        valid_sets=[dtrain, dvalid],
        valid_names=['train', 'valid'],
        **kwargs
    )

Override the predict method
- The parameters include the test features.
- Return the prediction score.
- Please refer to qlib.contrib.model.base.Model for the parameter types of the fit method.
- Code Example: In the following example, users need to use dnn to predict the label(such as preds) of test data x_test and return it.
def predict(self, x_test:pd.DataFrame, **kwargs)-> numpy.ndarray: if self._model is None: raise ValueError('model is not fitted yet!') return self._model.predict(x_test.values)

Override the score method

The parameters include the test features and test labels.
Return the evaluation score of the model. It’s recommended to adopt the loss between labels and prediction score.
Code Example: In the following example, users need to calculate the weighted loss with test data x_test, test label y_test and the weight w_test.

def score(self, x_test:pd.Dataframe, y_test:pd.Dataframe, w_test:pd.DataFrame = None) -> float:
    # Remove rows from x, y and w, which contain Nan in any columns in y_test.
    x_test, y_test, w_test = drop_nan_by_y_index(x_test, y_test, w_test)
    preds = self.predict(x_test)
    w_test_weight = None if w_test is None else w_test.values
    scorer = mean_squared_error if self.loss_type == 'mse' else roc_auc_score
    return scorer(y_test.values, preds, sample_weight=w_test_weight)

Override the save method & load method
- The save method parameter includes the a filename that represents an absolute path, user need to save model into the path.
- The load method parameter includes the a buffer read from the filename passed in the save method, users need to load model from the buffer.
- Code Example:
def save(self, filename): if self._model is None: raise ValueError('model is not fitted yet!') self._model.save_model(filename) def load(self, buffer): self._model = lgb.Booster(params={'model_str': buffer.decode('utf-8')})

Configuration File¶

The configuration file is described in detail in the estimator document. In order to integrate the custom model into Qlib, users need to modify the “model” field in the configuration file.

Example: The following example describes the model field of configuration file about the custom lightgbm model mentioned above, where module_path is the module path, class is the class name, and args is the hyperparameter passed into the __init__ method. All parameters in the field is passed to self._params by **kwargs in __init__ except loss = mse.

model:
    class: LGBModel
    module_path: qlib.contrib.model.gbdt
    args:
        loss: mse
        colsample_bytree: 0.8879
        learning_rate: 0.0421
        subsample: 0.8789
        lambda_l1: 205.6999
        lambda_l2: 580.9768
        max_depth: 8
        num_leaves: 210
        num_threads: 20

Users could find configuration file of the baseline of the Model in qlib/examples/estimator/estimator_config.yaml and qlib/examples/estimator/estimator_config_dnn.yaml

Model Testing¶

Assuming that the configuration file is examples/estimator/estimator_config.yaml, users can run the following command to test the custom model:

cd examples  # Avoid running program under the directory contains `qlib`
estimator -c estimator/estimator_config.yaml

Note

estimator is a built-in command of Qlib.

Also, Model can also be tested as a single module. An example has been given in examples/train_backtest_analyze.ipynb.

Reference¶

To know more about Model, please refer to Interday Model: Model Training & Prediction and Model API.