Custom Model Integration¶
Introduction¶
Qlib
provides lightGBM
and Dnn
model as the baseline of Interday Model
. In addition to the default model, users can integrate their own custom models into Qlib
.
Users can integrate their own custom models according to the following steps.
- Define a custom model class, which should be a subclass of the qlib.contrib.model.base.Model.
- Write a configuration file that describes the path and parameters of the custom model.
- Test the custom model.
Custom Model Class¶
The Custom models need to inherit qlib.contrib.model.base.Model and override the methods in it.
- Override the __init__ method
Qlib
passes the initialized parameters to the __init__ method.- The parameter must be consistent with the hyperparameters in the configuration file.
- Code Example: In the following example, the hyperparameter filed of the configuration file should contain parameters such as loss:mse.
def __init__(self, loss='mse', **kwargs): if loss not in {'mse', 'binary'}: raise NotImplementedError self._scorer = mean_squared_error if loss == 'mse' else roc_auc_score self._params.update(objective=loss, **kwargs) self._model = None
- Override the fit method
Qlib
calls the fit method to train the model- The parameters must include training feature x_train, training label y_train, test feature x_valid, test label y_valid at least.
- The parameters could include some optional parameters with default values, such as train weight w_train, test weight w_valid and num_boost_round = 1000.
- Code Example: In the following example, num_boost_round = 1000 is an optional parameter.
def fit(self, x_train:pd.DataFrame, y_train:pd.DataFrame, x_valid:pd.DataFrame, y_valid:pd.DataFrame, w_train:pd.DataFrame = None, w_valid:pd.DataFrame = None, num_boost_round = 1000, **kwargs): # Lightgbm need 1D array as its label if y_train.values.ndim == 2 and y_train.values.shape[1] == 1: y_train_1d, y_valid_1d = np.squeeze(y_train.values), np.squeeze(y_valid.values) else: raise ValueError('LightGBM doesn\'t support multi-label training') w_train_weight = None if w_train is None else w_train.values w_valid_weight = None if w_valid is None else w_valid.values dtrain = lgb.Dataset(x_train.values, label=y_train_1d, weight=w_train_weight) dvalid = lgb.Dataset(x_valid.values, label=y_valid_1d, weight=w_valid_weight) self._model = lgb.train( self._params, dtrain, num_boost_round=num_boost_round, valid_sets=[dtrain, dvalid], valid_names=['train', 'valid'], **kwargs )
- Override the predict method
- The parameters include the test features.
- Return the prediction score.
- Please refer to qlib.contrib.model.base.Model for the parameter types of the fit method.
- Code Example: In the following example, users need to use dnn to predict the label(such as preds) of test data x_test and return it.
def predict(self, x_test:pd.DataFrame, **kwargs)-> numpy.ndarray: if self._model is None: raise ValueError('model is not fitted yet!') return self._model.predict(x_test.values)
- Override the score method
- The parameters include the test features and test labels.
- Return the evaluation score of the model. It’s recommended to adopt the loss between labels and prediction score.
- Code Example: In the following example, users need to calculate the weighted loss with test data x_test, test label y_test and the weight w_test.
def score(self, x_test:pd.Dataframe, y_test:pd.Dataframe, w_test:pd.DataFrame = None) -> float: # Remove rows from x, y and w, which contain Nan in any columns in y_test. x_test, y_test, w_test = drop_nan_by_y_index(x_test, y_test, w_test) preds = self.predict(x_test) w_test_weight = None if w_test is None else w_test.values scorer = mean_squared_error if self.loss_type == 'mse' else roc_auc_score return scorer(y_test.values, preds, sample_weight=w_test_weight)
- Override the save method & load method
- The save method parameter includes the a filename that represents an absolute path, user need to save model into the path.
- The load method parameter includes the a buffer read from the filename passed in the save method, users need to load model from the buffer.
- Code Example:
def save(self, filename): if self._model is None: raise ValueError('model is not fitted yet!') self._model.save_model(filename) def load(self, buffer): self._model = lgb.Booster(params={'model_str': buffer.decode('utf-8')})
Configuration File¶
The configuration file is described in detail in the estimator document. In order to integrate the custom model into Qlib
, users need to modify the “model” field in the configuration file.
- Example: The following example describes the model field of configuration file about the custom lightgbm model mentioned above, where module_path is the module path, class is the class name, and args is the hyperparameter passed into the __init__ method. All parameters in the field is passed to self._params by **kwargs in __init__ except loss = mse.
model:
class: LGBModel
module_path: qlib.contrib.model.gbdt
args:
loss: mse
colsample_bytree: 0.8879
learning_rate: 0.0421
subsample: 0.8789
lambda_l1: 205.6999
lambda_l2: 580.9768
max_depth: 8
num_leaves: 210
num_threads: 20
Users could find configuration file of the baseline of the Model
in qlib/examples/estimator/estimator_config.yaml
and qlib/examples/estimator/estimator_config_dnn.yaml
Model Testing¶
Assuming that the configuration file is examples/estimator/estimator_config.yaml
, users can run the following command to test the custom model:
cd examples # Avoid running program under the directory contains `qlib`
estimator -c estimator/estimator_config.yaml
Note
estimator
is a built-in command of Qlib
.
Also, Model
can also be tested as a single module. An example has been given in examples/train_backtest_analyze.ipynb
.
Reference¶
To know more about Model
, please refer to Interday Model: Model Training & Prediction and Model API.