Interday Model: Model Training & Prediction¶
Introduction¶
Interday Model
is designed to make the prediction score about stocks. Users can use the Interday Model
in an automatic workflow by Estimator
, please refer to Estimator.
Because the components in Qlib
are designed in a loosely-coupled way, Interday Model
can be used as an independent module also.
Base Class & Interface¶
Qlib
provides a base class qlib.contrib.model.base.Model from which all models should inherit.
The base class provides the following interfaces:
- __init__(**kwargs)
- Initialization.
- If users use
Estimator
to start an experiment, the parameter of __init__ method shoule be consistent with the hyperparameters in the configuration file.
- fit(self, x_train, y_train, x_valid, y_valid, w_train=None, w_valid=None, **kwargs)
Train model.
- Parameter:
- x_train, pd.DataFrame type, train feature
The following example explains the value of x_train:
KMID KLEN KMID2 KUP KUP2 instrument datetime SH600004 2012-01-04 0.000000 0.017685 0.000000 0.012862 0.727275 2012-01-05 -0.006473 0.025890 -0.250001 0.012945 0.499998 2012-01-06 0.008117 0.019481 0.416666 0.008117 0.416666 2012-01-09 0.016051 0.025682 0.624998 0.006421 0.250001 2012-01-10 0.017323 0.026772 0.647057 0.003150 0.117648 ... ... ... ... ... ... SZ300273 2014-12-25 -0.005295 0.038697 -0.136843 0.016293 0.421052 2014-12-26 -0.022486 0.041701 -0.539215 0.002453 0.058824 2014-12-29 -0.031526 0.039092 -0.806451 0.000000 0.000000 2014-12-30 -0.010000 0.032174 -0.310811 0.013913 0.432433 2014-12-31 0.010917 0.020087 0.543479 0.001310 0.065216
x_train is a pandas DataFrame, whose index is MultiIndex <instrument(str), datetime(pd.Timestamp)>. Each column of x_train corresponds to a feature, and the column name is the feature name.
Note
The number and names of the columns are determined by the data handler, please refer to Data Handler and Estimator Data.
- y_train, pd.DataFrame type, train label
The following example explains the value of y_train:
LABEL instrument datetime SH600004 2012-01-04 -0.798456 2012-01-05 -1.366716 2012-01-06 -0.491026 2012-01-09 0.296900 2012-01-10 0.501426 ... ... SZ300273 2014-12-25 -0.465540 2014-12-26 0.233864 2014-12-29 0.471368 2014-12-30 0.411914 2014-12-31 1.342723
y_train is a pandas DataFrame, whose index is MultiIndex <instrument(str), datetime(pd.Timestamp)>. The LABEL column represents the value of train label.
Note
The number and names of the columns are determined by the
Data Handler
, please refer to Data Handler.
- x_valid, pd.DataFrame type, validation feature
The format of x_valid is same as x_train
- y_valid, pd.DataFrame type, validation label
The format of y_valid is same as y_train
- `w_train`(Optional args, default is None), pd.DataFrame type, train weight
w_train is a pandas DataFrame, whose shape and index is same as x_train. The float value in w_train represents the weight of the feature at the same position in x_train.
- `w_train`(Optional args, default is None), pd.DataFrame type, validation weight
w_train is a pandas DataFrame, whose shape and index is the same as x_valid. The float value in w_train represents the weight of the feature at the same position in x_train.
- predict(self, x_test, **kwargs)
- Predict test data ‘x_test’
- Parameter:
- x_test, pd.DataFrame type, test features
- The form of x_test is same as x_train in ‘fit’ method.
- Return:
- label, np.ndarray type, test label
- The label of x_test that predicted by model.
- score(self, x_test, y_test, w_test=None, **kwargs)
- Evaluate model with test feature/label
- Parameter:
- x_test, pd.DataFrame type, test feature
- The format of x_test is same as x_train in fit method.
- x_test, pd.DataFrame type, test label
- The format of y_test is same as y_train in fit method.
- w_test, pd.DataFrame type, test weight
- The format of w_test is same as w_train in fit method.
- Return: float type, evaluation score
For other interfaces such as save, load, finetune, please refer to Model API.
Example¶
Qlib
provides LightGBM
and DNN
models as the baseline, the following steps show how to run`` LightGBM`` as an independent module.
Initialize
Qlib
with qlib.init first, please refer to initialization.- Run the following code to get the prediction score pred_score
from qlib.contrib.estimator.handler import QLibDataHandlerClose from qlib.contrib.model.gbdt import LGBModel DATA_HANDLER_CONFIG = { "dropna_label": True, "start_date": "2007-01-01", "end_date": "2020-08-01", "market": MARKET, } TRAINER_CONFIG = { "train_start_date": "2007-01-01", "train_end_date": "2014-12-31", "validate_start_date": "2015-01-01", "validate_end_date": "2016-12-31", "test_start_date": "2017-01-01", "test_end_date": "2020-08-01", } x_train, y_train, x_validate, y_validate, x_test, y_test = QLibDataHandlerClose( **DATA_HANDLER_CONFIG ).get_split_data(**TRAINER_CONFIG) MODEL_CONFIG = { "loss": "mse", "colsample_bytree": 0.8879, "learning_rate": 0.0421, "subsample": 0.8789, "lambda_l1": 205.6999, "lambda_l2": 580.9768, "max_depth": 8, "num_leaves": 210, "num_threads": 20, } # use default model # custom Model, refer to: TODO: Model API url model = LGBModel(**MODEL_CONFIG) model.fit(x_train, y_train, x_validate, y_validate) _pred = model.predict(x_test) pred_score = pd.DataFrame(index=_pred.index) pred_score["score"] = _pred.iloc(axis=1)[0]
Note
QLibDataHandlerClose is the data handler provided by
Qlib
, please refer to Data Handler.
Also, the above example has been given in examples/train_backtest_analyze.ipynb
.
Custom Model¶
Qlib supports custom models. If users are interested in customizing their own models and integrating the models into Qlib
, please refer to Custom Model Integration.