Forecast Model: Model Training & Prediction¶
Introduction¶
Forecast Model
is designed to make the prediction score about stocks. Users can use the Forecast Model
in an automatic workflow by qrun
, please refer to Workflow: Workflow Management.
Because the components in Qlib
are designed in a loosely-coupled way, Forecast Model
can be used as an independent module also.
Base Class & Interface¶
Qlib
provides a base class qlib.model.base.Model from which all models should inherit.
The base class provides the following interfaces:
-
class
qlib.model.base.
Model
Learnable Models
-
fit
(dataset: qlib.data.dataset.Dataset, reweighter: qlib.data.dataset.weight.Reweighter) Learn model from the base model
Note
The attribute names of learned model should not start with ‘_’. So that the model could be dumped to disk.
The following code example shows how to retrieve x_train, y_train and w_train from the dataset:
# get features and labels df_train, df_valid = dataset.prepare( ["train", "valid"], col_set=["feature", "label"], data_key=DataHandlerLP.DK_L ) x_train, y_train = df_train["feature"], df_train["label"] x_valid, y_valid = df_valid["feature"], df_valid["label"] # get weights try: wdf_train, wdf_valid = dataset.prepare(["train", "valid"], col_set=["weight"], data_key=DataHandlerLP.DK_L) w_train, w_valid = wdf_train["weight"], wdf_valid["weight"] except KeyError as e: w_train = pd.DataFrame(np.ones_like(y_train.values), index=y_train.index) w_valid = pd.DataFrame(np.ones_like(y_valid.values), index=y_valid.index)
Parameters: dataset (Dataset) – dataset will generate the processed data from model training.
-
predict
(dataset: qlib.data.dataset.Dataset, segment: Union[str, slice] = 'test') → object give prediction given Dataset
Parameters: - dataset (Dataset) – dataset will generate the processed dataset from model training.
- segment (Text or slice) – dataset will use this segment to prepare data. (default=test)
Returns: Return type: Prediction results with certain type such as pandas.Series.
-
Qlib
also provides a base class qlib.model.base.ModelFT, which includes the method for finetuning the model.
For other interfaces such as finetune, please refer to Model API.
Example¶
Qlib
’s Model Zoo includes models such as LightGBM
, MLP
, LSTM
, etc.. These models are treated as the baselines of Forecast Model
. The following steps show how to run`` LightGBM`` as an independent module.
Initialize
Qlib
with qlib.init first, please refer to Initialization.- Run the following code to get the prediction score pred_score
from qlib.contrib.model.gbdt import LGBModel from qlib.contrib.data.handler import Alpha158 from qlib.utils import init_instance_by_config, flatten_dict from qlib.workflow import R from qlib.workflow.record_temp import SignalRecord, PortAnaRecord market = "csi300" benchmark = "SH000300" data_handler_config = { "start_time": "2008-01-01", "end_time": "2020-08-01", "fit_start_time": "2008-01-01", "fit_end_time": "2014-12-31", "instruments": market, } task = { "model": { "class": "LGBModel", "module_path": "qlib.contrib.model.gbdt", "kwargs": { "loss": "mse", "colsample_bytree": 0.8879, "learning_rate": 0.0421, "subsample": 0.8789, "lambda_l1": 205.6999, "lambda_l2": 580.9768, "max_depth": 8, "num_leaves": 210, "num_threads": 20, }, }, "dataset": { "class": "DatasetH", "module_path": "qlib.data.dataset", "kwargs": { "handler": { "class": "Alpha158", "module_path": "qlib.contrib.data.handler", "kwargs": data_handler_config, }, "segments": { "train": ("2008-01-01", "2014-12-31"), "valid": ("2015-01-01", "2016-12-31"), "test": ("2017-01-01", "2020-08-01"), }, }, }, } # model initiaiton model = init_instance_by_config(task["model"]) dataset = init_instance_by_config(task["dataset"]) # start exp with R.start(experiment_name="workflow"): # train R.log_params(**flatten_dict(task)) model.fit(dataset) # prediction recorder = R.get_recorder() sr = SignalRecord(model, dataset, recorder) sr.generate()
Note
Alpha158 is the data handler provided by
Qlib
, please refer to Data Handler. SignalRecord is the Record Template inQlib
, please refer to Workflow.
Also, the above example has been given in examples/train_backtest_analyze.ipynb
.
Technically, the meaning of the model prediction depends on the label setting designed by user.
By default, the meaning of the score is normally the rating of the instruments by the forecasting model. The higher the score, the more profit the instruments.
Custom Model¶
Qlib supports custom models. If users are interested in customizing their own models and integrating the models into Qlib
, please refer to Custom Model Integration.