Interday Model: Model Training & Prediction

Introduction

Interday Model is designed to make the prediction score about stocks. Users can use the Interday Model in an automatic workflow by Estimator, please refer to Estimator.

Because the components in Qlib are designed in a loosely-coupled way, Interday Model can be used as an independent module also.

Base Class & Interface

Qlib provides a base class qlib.contrib.model.base.Model from which all models should inherit.

The base class provides the following interfaces:

  • __init__(**kwargs)
    • Initialization.
    • If users use Estimator to start an experiment, the parameter of __init__ method shoule be consistent with the hyperparameters in the configuration file.
  • fit(self, x_train, y_train, x_valid, y_valid, w_train=None, w_valid=None, **kwargs)
    • Train model.

    • Parameter:
      • x_train, pd.DataFrame type, train feature

        The following example explains the value of x_train:

                                KMID      KLEN      KMID2     KUP       KUP2
        instrument  datetime
        SH600004    2012-01-04  0.000000  0.017685  0.000000  0.012862  0.727275
                    2012-01-05 -0.006473  0.025890 -0.250001  0.012945  0.499998
                    2012-01-06  0.008117  0.019481  0.416666  0.008117  0.416666
                    2012-01-09  0.016051  0.025682  0.624998  0.006421  0.250001
                    2012-01-10  0.017323  0.026772  0.647057  0.003150  0.117648
        ...                         ...       ...       ...       ...       ...
        SZ300273    2014-12-25 -0.005295  0.038697 -0.136843  0.016293  0.421052
                    2014-12-26 -0.022486  0.041701 -0.539215  0.002453  0.058824
                    2014-12-29 -0.031526  0.039092 -0.806451  0.000000  0.000000
                    2014-12-30 -0.010000  0.032174 -0.310811  0.013913  0.432433
                    2014-12-31  0.010917  0.020087  0.543479  0.001310  0.065216
        

        x_train is a pandas DataFrame, whose index is MultiIndex <instrument(str), datetime(pd.Timestamp)>. Each column of x_train corresponds to a feature, and the column name is the feature name.

        Note

        The number and names of the columns are determined by the data handler, please refer to Data Handler and Estimator Data.

      • y_train, pd.DataFrame type, train label

        The following example explains the value of y_train:

                                LABEL
        instrument  datetime
        SH600004    2012-01-04 -0.798456
                    2012-01-05 -1.366716
                    2012-01-06 -0.491026
                    2012-01-09  0.296900
                    2012-01-10  0.501426
        ...                         ...
        SZ300273    2014-12-25 -0.465540
                    2014-12-26  0.233864
                    2014-12-29  0.471368
                    2014-12-30  0.411914
                    2014-12-31  1.342723
        

        y_train is a pandas DataFrame, whose index is MultiIndex <instrument(str), datetime(pd.Timestamp)>. The LABEL column represents the value of train label.

        Note

        The number and names of the columns are determined by the Data Handler, please refer to Data Handler.

      • x_valid, pd.DataFrame type, validation feature

        The format of x_valid is same as x_train

      • y_valid, pd.DataFrame type, validation label

        The format of y_valid is same as y_train

      • `w_train`(Optional args, default is None), pd.DataFrame type, train weight

        w_train is a pandas DataFrame, whose shape and index is same as x_train. The float value in w_train represents the weight of the feature at the same position in x_train.

      • `w_train`(Optional args, default is None), pd.DataFrame type, validation weight

        w_train is a pandas DataFrame, whose shape and index is the same as x_valid. The float value in w_train represents the weight of the feature at the same position in x_train.

  • predict(self, x_test, **kwargs)
    • Predict test data ‘x_test’
    • Parameter:
      • x_test, pd.DataFrame type, test features
        The form of x_test is same as x_train in ‘fit’ method.
    • Return:
      • label, np.ndarray type, test label
        The label of x_test that predicted by model.
  • score(self, x_test, y_test, w_test=None, **kwargs)
    • Evaluate model with test feature/label
    • Parameter:
      • x_test, pd.DataFrame type, test feature
        The format of x_test is same as x_train in fit method.
      • x_test, pd.DataFrame type, test label
        The format of y_test is same as y_train in fit method.
      • w_test, pd.DataFrame type, test weight
        The format of w_test is same as w_train in fit method.
    • Return: float type, evaluation score

For other interfaces such as save, load, finetune, please refer to Model API.

Example

Qlib provides LightGBM and DNN models as the baseline, the following steps show how to run`` LightGBM`` as an independent module.

  • Initialize Qlib with qlib.init first, please refer to initialization.

  • Run the following code to get the prediction score pred_score
    from qlib.contrib.estimator.handler import QLibDataHandlerClose
    from qlib.contrib.model.gbdt import LGBModel
    
    DATA_HANDLER_CONFIG = {
        "dropna_label": True,
        "start_date": "2007-01-01",
        "end_date": "2020-08-01",
        "market": MARKET,
    }
    
    TRAINER_CONFIG = {
        "train_start_date": "2007-01-01",
        "train_end_date": "2014-12-31",
        "validate_start_date": "2015-01-01",
        "validate_end_date": "2016-12-31",
        "test_start_date": "2017-01-01",
        "test_end_date": "2020-08-01",
    }
    
    x_train, y_train, x_validate, y_validate, x_test, y_test = QLibDataHandlerClose(
        **DATA_HANDLER_CONFIG
    ).get_split_data(**TRAINER_CONFIG)
    
    
    MODEL_CONFIG = {
        "loss": "mse",
        "colsample_bytree": 0.8879,
        "learning_rate": 0.0421,
        "subsample": 0.8789,
        "lambda_l1": 205.6999,
        "lambda_l2": 580.9768,
        "max_depth": 8,
        "num_leaves": 210,
        "num_threads": 20,
    }
    # use default model
    # custom Model, refer to: TODO: Model API url
    model = LGBModel(**MODEL_CONFIG)
    model.fit(x_train, y_train, x_validate, y_validate)
    _pred = model.predict(x_test)
    pred_score = pd.DataFrame(index=_pred.index)
    pred_score["score"] = _pred.iloc(axis=1)[0]
    

    Note

    QLibDataHandlerClose is the data handler provided by Qlib, please refer to Data Handler.

Also, the above example has been given in examples/train_backtest_analyze.ipynb.

Custom Model

Qlib supports custom models. If users are interested in customizing their own models and integrating the models into Qlib, please refer to Custom Model Integration.

API

Please refer to Model API.