API Reference

Here you can find all Qlib interfaces.

Data

Provider

class qlib.data.data.CalendarProvider

Calendar provider base class

Provide calendar data.

calendar(start_time=None, end_time=None, freq='day', future=False)

Get calendar of certain market in given time range.

Parameters:
  • start_time (str) – start of the time range.
  • end_time (str) – end of the time range.
  • freq (str) – time frequency, available: year/quarter/month/week/day.
  • future (bool) – whether including future trading day.
Returns:

calendar list

Return type:

list

locate_index(start_time, end_time, freq, future)

Locate the start time index and end time index in a calendar under certain frequency.

Parameters:
  • start_time (str) – start of the time range.
  • end_time (str) – end of the time range.
  • freq (str) – time frequency, available: year/quarter/month/week/day.
  • future (bool) – whether including future trading day.
Returns:

  • pd.Timestamp – the real start time.
  • pd.Timestamp – the real end time.
  • int – the index of start time.
  • int – the index of end time.

class qlib.data.data.InstrumentProvider

Instrument provider base class

Provide instrument data.

static instruments(market='all', filter_pipe=None)

Get the general config dictionary for a base market adding several dynamic filters.

Parameters:
  • market (str) – market/industry/index shortname, e.g. all/sse/szse/sse50/csi300/csi500.
  • filter_pipe (list) – the list of dynamic filters.
Returns:

dict of stockpool config. {`market`=>base market name, `filter_pipe`=>list of filters}

example :

Return type:

dict

list_instruments(instruments, start_time=None, end_time=None, freq='day', as_list=False)

List the instruments based on a certain stockpool config.

Parameters:
  • instruments (dict) – stockpool config.
  • start_time (str) – start of the time range.
  • end_time (str) – end of the time range.
  • as_list (bool) – return instruments as list or dict.
Returns:

instruments list or dictionary with time spans

Return type:

dict or list

class qlib.data.data.FeatureProvider

Feature provider class

Provide feature data.

feature(instrument, field, start_time, end_time, freq)

Get feature data.

Parameters:
  • instrument (str) – a certain instrument.
  • field (str) – a certain field of feature.
  • start_time (str) – start of the time range.
  • end_time (str) – end of the time range.
  • freq (str) – time frequency, available: year/quarter/month/week/day.
Returns:

data of a certain feature

Return type:

pd.Series

class qlib.data.data.ExpressionProvider

Expression provider class

Provide Expression data.

__init__()

Initialize self. See help(type(self)) for accurate signature.

expression(instrument, field, start_time=None, end_time=None, freq='day')

Get Expression data.

Parameters:
  • instrument (str) – a certain instrument.
  • field (str) – a certain field of feature.
  • start_time (str) – start of the time range.
  • end_time (str) – end of the time range.
  • freq (str) – time frequency, available: year/quarter/month/week/day.
Returns:

data of a certain expression

Return type:

pd.Series

class qlib.data.data.DatasetProvider

Dataset provider class

Provide Dataset data.

dataset(instruments, fields, start_time=None, end_time=None, freq='day')

Get dataset data.

Parameters:
  • instruments (list or dict) – list/dict of instruments or dict of stockpool config.
  • fields (list) – list of feature instances.
  • start_time (str) – start of the time range.
  • end_time (str) – end of the time range.
  • freq (str) – time frequency.
Returns:

a pandas dataframe with <instrument, datetime> index.

Return type:

pd.DataFrame

static get_instruments_d(instruments, freq)

Parse different types of input instruments to output instruments_d Wrong format of input instruments will lead to exception.

static get_column_names(fields)

Get column names from input fields

static dataset_processor(instruments_d, column_names, start_time, end_time, freq)

Load and process the data, return the data set. - default using multi-kernel method.

static expression_calculator(inst, start_time, end_time, freq, column_names, spans=None, g_config=None)

Calculate the expressions for one instrument, return a df result. If the expression has been calculated before, load from cache.

return value: A data frame with index ‘datetime’ and other data columns.

class qlib.data.data.LocalCalendarProvider(**kwargs)

Local calendar data provider class

Provide calendar data from local data source.

__init__(**kwargs)

Initialize self. See help(type(self)) for accurate signature.

load_calendar(freq, future)

Load original calendar timestamp from file.

Parameters:freq (str) – frequency of read calendar file.
Returns:list of timestamps
Return type:list
calendar(start_time=None, end_time=None, freq='day', future=False)

Get calendar of certain market in given time range.

Parameters:
  • start_time (str) – start of the time range.
  • end_time (str) – end of the time range.
  • freq (str) – time frequency, available: year/quarter/month/week/day.
  • future (bool) – whether including future trading day.
Returns:

calendar list

Return type:

list

class qlib.data.data.LocalInstrumentProvider

Local instrument data provider class

Provide instrument data from local data source.

__init__()

Initialize self. See help(type(self)) for accurate signature.

list_instruments(instruments, start_time=None, end_time=None, freq='day', as_list=False)

List the instruments based on a certain stockpool config.

Parameters:
  • instruments (dict) – stockpool config.
  • start_time (str) – start of the time range.
  • end_time (str) – end of the time range.
  • as_list (bool) – return instruments as list or dict.
Returns:

instruments list or dictionary with time spans

Return type:

dict or list

class qlib.data.data.LocalFeatureProvider(**kwargs)

Local feature data provider class

Provide feature data from local data source.

__init__(**kwargs)

Initialize self. See help(type(self)) for accurate signature.

feature(instrument, field, start_index, end_index, freq)

Get feature data.

Parameters:
  • instrument (str) – a certain instrument.
  • field (str) – a certain field of feature.
  • start_time (str) – start of the time range.
  • end_time (str) – end of the time range.
  • freq (str) – time frequency, available: year/quarter/month/week/day.
Returns:

data of a certain feature

Return type:

pd.Series

class qlib.data.data.LocalExpressionProvider

Local expression data provider class

Provide expression data from local data source.

__init__()

Initialize self. See help(type(self)) for accurate signature.

expression(instrument, field, start_time=None, end_time=None, freq='day')

Get Expression data.

Parameters:
  • instrument (str) – a certain instrument.
  • field (str) – a certain field of feature.
  • start_time (str) – start of the time range.
  • end_time (str) – end of the time range.
  • freq (str) – time frequency, available: year/quarter/month/week/day.
Returns:

data of a certain expression

Return type:

pd.Series

class qlib.data.data.LocalDatasetProvider

Local dataset data provider class

Provide dataset data from local data source.

__init__()

Initialize self. See help(type(self)) for accurate signature.

dataset(instruments, fields, start_time=None, end_time=None, freq='day')

Get dataset data.

Parameters:
  • instruments (list or dict) – list/dict of instruments or dict of stockpool config.
  • fields (list) – list of feature instances.
  • start_time (str) – start of the time range.
  • end_time (str) – end of the time range.
  • freq (str) – time frequency.
Returns:

a pandas dataframe with <instrument, datetime> index.

Return type:

pd.DataFrame

static multi_cache_walker(instruments, fields, start_time=None, end_time=None, freq='day')

This method is used to prepare the expression cache for the client. Then the client will load the data from expression cache by itself.

static cache_walker(inst, start_time, end_time, freq, column_names)

If the expressions of one instrument haven’t been calculated before, calculate it and write it into expression cache.

class qlib.data.data.ClientCalendarProvider

Client calendar data provider class

Provide calendar data by requesting data from server as a client.

__init__()

Initialize self. See help(type(self)) for accurate signature.

calendar(start_time=None, end_time=None, freq='day', future=False)

Get calendar of certain market in given time range.

Parameters:
  • start_time (str) – start of the time range.
  • end_time (str) – end of the time range.
  • freq (str) – time frequency, available: year/quarter/month/week/day.
  • future (bool) – whether including future trading day.
Returns:

calendar list

Return type:

list

class qlib.data.data.ClientInstrumentProvider

Client instrument data provider class

Provide instrument data by requesting data from server as a client.

__init__()

Initialize self. See help(type(self)) for accurate signature.

list_instruments(instruments, start_time=None, end_time=None, freq='day', as_list=False)

List the instruments based on a certain stockpool config.

Parameters:
  • instruments (dict) – stockpool config.
  • start_time (str) – start of the time range.
  • end_time (str) – end of the time range.
  • as_list (bool) – return instruments as list or dict.
Returns:

instruments list or dictionary with time spans

Return type:

dict or list

class qlib.data.data.ClientDatasetProvider

Client dataset data provider class

Provide dataset data by requesting data from server as a client.

__init__()

Initialize self. See help(type(self)) for accurate signature.

dataset(instruments, fields, start_time=None, end_time=None, freq='day', disk_cache=0, return_uri=False)

Get dataset data.

Parameters:
  • instruments (list or dict) – list/dict of instruments or dict of stockpool config.
  • fields (list) – list of feature instances.
  • start_time (str) – start of the time range.
  • end_time (str) – end of the time range.
  • freq (str) – time frequency.
Returns:

a pandas dataframe with <instrument, datetime> index.

Return type:

pd.DataFrame

class qlib.data.data.BaseProvider

Local provider class

To keep compatible with old qlib provider.

features(instruments, fields, start_time=None, end_time=None, freq='day', disk_cache=None)
disk_cache : int
whether to skip(0)/use(1)/replace(2) disk_cache

This function will try to use cache method which has a keyword disk_cache, and will use provider method if a type error is raised because the DatasetD instance is a provider class.

class qlib.data.data.LocalProvider
features_uri(instruments, fields, start_time, end_time, freq, disk_cache=1)

Return the uri of the generated cache of features/dataset

Parameters:
  • disk_cache
  • instruments
  • fields
  • start_time
  • end_time
  • freq
class qlib.data.data.ClientProvider

Client Provider

Requesting data from server as a client. Can propose requests:
  • Calendar : Directly respond a list of calendars
  • Instruments (without filter): Directly respond a list/dict of instruments
  • Instruments (with filters): Respond a list/dict of instruments
  • Features : Respond a cache uri

The general workflow is described as follows: When the user use client provider to propose a request, the client provider will connect the server and send the request. The client will start to wait for the response. The response will be made instantly indicating whether the cache is available. The waiting procedure will terminate only when the client get the reponse saying feature_available is true. BUG : Everytime we make request for certain data we need to connect to the server, wait for the response and disconnect from it. We can’t make a sequence of requests within one connection. You can refer to https://python-socketio.readthedocs.io/en/latest/client.html for documentation of python-socketIO client.

__init__()

Initialize self. See help(type(self)) for accurate signature.

qlib.data.data.CalendarProviderWrapper

alias of qlib.data.data.CalendarProvider

qlib.data.data.InstrumentProviderWrapper

alias of qlib.data.data.InstrumentProvider

qlib.data.data.FeatureProviderWrapper

alias of qlib.data.data.FeatureProvider

qlib.data.data.ExpressionProviderWrapper

alias of qlib.data.data.ExpressionProvider

qlib.data.data.DatasetProviderWrapper

alias of qlib.data.data.DatasetProvider

qlib.data.data.BaseProviderWrapper

alias of qlib.data.data.BaseProvider

qlib.data.data.register_all_wrappers(C)

Filter

class qlib.data.filter.BaseDFilter

Dynamic Instruments Filter Abstract class

Users can override this class to construct their own filter

Override __init__ to input filter regulations

Override filter_main to use the regulations to filter instruments

__init__()

Initialize self. See help(type(self)) for accurate signature.

static from_config(config)

Construct an instance from config dict.

Parameters:config (dict) – dict of config parameters.
to_config()

Construct an instance from config dict.

Returns:return the dict of config parameters.
Return type:dict
class qlib.data.filter.SeriesDFilter(fstart_time=None, fend_time=None)

Dynamic Instruments Filter Abstract class to filter a series of certain features

Filters should provide parameters:

  • filter start time
  • filter end time
  • filter rule

Override __init__ to assign a certain rule to filter the series.

Override _getFilterSeries to use the rule to filter the series and get a dict of {inst => series}, or override filter_main for more advanced series filter rule

__init__(fstart_time=None, fend_time=None)
Init function for filter base class.
Filter a set of instruments based on a certain rule within a certain period assigned by fstart_time and fend_time.
Parameters:
  • fstart_time (str) – the time for the filter rule to start filter the instruments.
  • fend_time (str) – the time for the filter rule to stop filter the instruments.
filter_main(instruments, start_time=None, end_time=None)

Implement this method to filter the instruments.

Parameters:
  • instruments (dict) – input instruments to be filtered.
  • start_time (str) – start of the time range.
  • end_time (str) – end of the time range.
Returns:

filtered instruments, same structure as input instruments.

Return type:

dict

class qlib.data.filter.NameDFilter(name_rule_re, fstart_time=None, fend_time=None)

Name dynamic instrument filter

Filter the instruments based on a regulated name format.

A name rule regular expression is required.

__init__(name_rule_re, fstart_time=None, fend_time=None)

Init function for name filter class

name_rule_re: str
regular expression for the name rule.
static from_config(config)

Construct an instance from config dict.

Parameters:config (dict) – dict of config parameters.
to_config()

Construct an instance from config dict.

Returns:return the dict of config parameters.
Return type:dict
class qlib.data.filter.ExpressionDFilter(rule_expression, fstart_time=None, fend_time=None, keep=False)

Expression dynamic instrument filter

Filter the instruments based on a certain expression.

An expression rule indicating a certain feature field is required.

Examples

  • basic features filter : rule_expression = ‘$close/$open>5’
  • cross-sectional features filter : rule_expression = ‘$rank($close)<10’
  • time-sequence features filter : rule_expression = ‘$Ref($close, 3)>100’
__init__(rule_expression, fstart_time=None, fend_time=None, keep=False)

Init function for expression filter class

fstart_time: str
filter the feature starting from this time.
fend_time: str
filter the feature ending by this time.
rule_expression: str
an input expression for the rule.
keep: bool
whether to keep the instruments of which features don’t exist in the filter time span.
from_config()

Construct an instance from config dict.

Parameters:config (dict) – dict of config parameters.
to_config()

Construct an instance from config dict.

Returns:return the dict of config parameters.
Return type:dict

Class

class qlib.data.base.Expression

Expression base class

load(instrument, start_index, end_index, freq)

load feature

Parameters:
  • instrument (str) – instrument code.
  • start_index (str) – feature start index [in calendar].
  • end_index (str) – feature end index [in calendar].
  • freq (str) – feature frequency.
Returns:

feature series: The index of the series is the calendar index

Return type:

pd.Series

get_longest_back_rolling()

Get the longest length of historical data the feature has accessed

This is designed for getting the needed range of the data to calculate the features in specific range at first. However, situations like Ref(Ref($close, -1), 1) can not be handled rightly.

So this will only used for detecting the length of historical data needed.

get_extended_window_size()

get_extend_window_size

For to calculate this Operator in range[start_index, end_index] We have to get the leaf feature in range[start_index - lft_etd, end_index + rght_etd].

Returns:lft_etd, rght_etd
Return type:(int, int)
class qlib.data.base.Feature(name=None)

Static Expression

This kind of feature will load data from provider

__init__(name=None)

Initialize self. See help(type(self)) for accurate signature.

get_longest_back_rolling()

Get the longest length of historical data the feature has accessed

This is designed for getting the needed range of the data to calculate the features in specific range at first. However, situations like Ref(Ref($close, -1), 1) can not be handled rightly.

So this will only used for detecting the length of historical data needed.

get_extended_window_size()

get_extend_window_size

For to calculate this Operator in range[start_index, end_index] We have to get the leaf feature in range[start_index - lft_etd, end_index + rght_etd].

Returns:lft_etd, rght_etd
Return type:(int, int)
class qlib.data.base.ExpressionOps

Operator Expression

This kind of feature will use operator for feature construction on the fly.

Operator

class qlib.data.ops.ElemOperator(feature)

Element-wise Operator

Parameters:feature (Expression) – feature instance
Returns:feature operation output
Return type:Expression
__init__(feature)

Initialize self. See help(type(self)) for accurate signature.

get_longest_back_rolling()

Get the longest length of historical data the feature has accessed

This is designed for getting the needed range of the data to calculate the features in specific range at first. However, situations like Ref(Ref($close, -1), 1) can not be handled rightly.

So this will only used for detecting the length of historical data needed.

get_extended_window_size()

get_extend_window_size

For to calculate this Operator in range[start_index, end_index] We have to get the leaf feature in range[start_index - lft_etd, end_index + rght_etd].

Returns:lft_etd, rght_etd
Return type:(int, int)
class qlib.data.ops.NpElemOperator(feature, func)

Numpy Element-wise Operator

Parameters:
  • feature (Expression) – feature instance
  • func (str) – numpy feature operation method
Returns:

feature operation output

Return type:

Expression

__init__(feature, func)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.ops.Abs(feature)

Feature Absolute Value

Parameters:feature (Expression) – feature instance
Returns:a feature instance with absolute output
Return type:Expression
__init__(feature)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.ops.Sign(feature)

Feature Sign

Parameters:feature (Expression) – feature instance
Returns:a feature instance with sign
Return type:Expression
__init__(feature)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.ops.Log(feature)

Feature Log

Parameters:feature (Expression) – feature instance
Returns:a feature instance with log
Return type:Expression
__init__(feature)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.ops.Power(feature, exponent)

Feature Power

Parameters:feature (Expression) – feature instance
Returns:a feature instance with power
Return type:Expression
__init__(feature, exponent)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.ops.Mask(feature, instrument)

Feature Mask

Parameters:
  • feature (Expression) – feature instance
  • instrument (str) – instrument mask
Returns:

a feature instance with masked instrument

Return type:

Expression

__init__(feature, instrument)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.ops.Not(feature)

Not Operator

Parameters:
Returns:

feature elementwise not output

Return type:

Feature

__init__(feature)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.ops.PairOperator(feature_left, feature_right)

Pair-wise operator

Parameters:
  • feature_left (Expression) – feature instance or numeric value
  • feature_right (Expression) – feature instance or numeric value
  • func (str) – operator function
Returns:

two features’ operation output

Return type:

Feature

__init__(feature_left, feature_right)

Initialize self. See help(type(self)) for accurate signature.

get_longest_back_rolling()

Get the longest length of historical data the feature has accessed

This is designed for getting the needed range of the data to calculate the features in specific range at first. However, situations like Ref(Ref($close, -1), 1) can not be handled rightly.

So this will only used for detecting the length of historical data needed.

get_extended_window_size()

get_extend_window_size

For to calculate this Operator in range[start_index, end_index] We have to get the leaf feature in range[start_index - lft_etd, end_index + rght_etd].

Returns:lft_etd, rght_etd
Return type:(int, int)
class qlib.data.ops.NpPairOperator(feature_left, feature_right, func)

Numpy Pair-wise operator

Parameters:
  • feature_left (Expression) – feature instance or numeric value
  • feature_right (Expression) – feature instance or numeric value
  • func (str) – operator function
Returns:

two features’ operation output

Return type:

Feature

__init__(feature_left, feature_right, func)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.ops.Add(feature_left, feature_right)

Add Operator

Parameters:
Returns:

two features’ sum

Return type:

Feature

__init__(feature_left, feature_right)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.ops.Sub(feature_left, feature_right)

Subtract Operator

Parameters:
Returns:

two features’ subtraction

Return type:

Feature

__init__(feature_left, feature_right)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.ops.Mul(feature_left, feature_right)

Multiply Operator

Parameters:
Returns:

two features’ product

Return type:

Feature

__init__(feature_left, feature_right)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.ops.Div(feature_left, feature_right)

Division Operator

Parameters:
Returns:

two features’ division

Return type:

Feature

__init__(feature_left, feature_right)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.ops.Greater(feature_left, feature_right)

Greater Operator

Parameters:
Returns:

greater elements taken from the input two features

Return type:

Feature

__init__(feature_left, feature_right)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.ops.Less(feature_left, feature_right)

Less Operator

Parameters:
Returns:

smaller elements taken from the input two features

Return type:

Feature

__init__(feature_left, feature_right)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.ops.Gt(feature_left, feature_right)

Greater Than Operator

Parameters:
Returns:

bool series indicate left > right

Return type:

Feature

__init__(feature_left, feature_right)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.ops.Ge(feature_left, feature_right)

Greater Equal Than Operator

Parameters:
Returns:

bool series indicate left >= right

Return type:

Feature

__init__(feature_left, feature_right)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.ops.Lt(feature_left, feature_right)

Less Than Operator

Parameters:
Returns:

bool series indicate left < right

Return type:

Feature

__init__(feature_left, feature_right)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.ops.Le(feature_left, feature_right)

Less Equal Than Operator

Parameters:
Returns:

bool series indicate left <= right

Return type:

Feature

__init__(feature_left, feature_right)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.ops.Eq(feature_left, feature_right)

Equal Operator

Parameters:
Returns:

bool series indicate left == right

Return type:

Feature

__init__(feature_left, feature_right)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.ops.Ne(feature_left, feature_right)

Not Equal Operator

Parameters:
Returns:

bool series indicate left != right

Return type:

Feature

__init__(feature_left, feature_right)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.ops.And(feature_left, feature_right)

And Operator

Parameters:
Returns:

two features’ row by row & output

Return type:

Feature

__init__(feature_left, feature_right)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.ops.Or(feature_left, feature_right)

Or Operator

Parameters:
Returns:

two features’ row by row | outputs

Return type:

Feature

__init__(feature_left, feature_right)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.ops.If(condition, feature_left, feature_right)

If Operator

Parameters:
  • condition (Expression) – feature instance with bool values as condition
  • feature_left (Expression) – feature instance
  • feature_right (Expression) – feature instance
__init__(condition, feature_left, feature_right)

Initialize self. See help(type(self)) for accurate signature.

get_longest_back_rolling()

Get the longest length of historical data the feature has accessed

This is designed for getting the needed range of the data to calculate the features in specific range at first. However, situations like Ref(Ref($close, -1), 1) can not be handled rightly.

So this will only used for detecting the length of historical data needed.

get_extended_window_size()

get_extend_window_size

For to calculate this Operator in range[start_index, end_index] We have to get the leaf feature in range[start_index - lft_etd, end_index + rght_etd].

Returns:lft_etd, rght_etd
Return type:(int, int)
class qlib.data.ops.Rolling(feature, N, func)

Rolling Operator

Parameters:
  • feature (Expression) – feature instance
  • N (int) – rolling window size
  • func (str) – rolling method
Returns:

rolling outputs

Return type:

Expression

__init__(feature, N, func)

Initialize self. See help(type(self)) for accurate signature.

get_longest_back_rolling()

Get the longest length of historical data the feature has accessed

This is designed for getting the needed range of the data to calculate the features in specific range at first. However, situations like Ref(Ref($close, -1), 1) can not be handled rightly.

So this will only used for detecting the length of historical data needed.

get_extended_window_size()

get_extend_window_size

For to calculate this Operator in range[start_index, end_index] We have to get the leaf feature in range[start_index - lft_etd, end_index + rght_etd].

Returns:lft_etd, rght_etd
Return type:(int, int)
class qlib.data.ops.Ref(feature, N)

Feature Reference

Parameters:
  • feature (Expression) – feature instance
  • N (int) – N = 0, retrieve the first data; N > 0, retrieve data of N periods ago; N < 0, future data
Returns:

a feature instance with target reference

Return type:

Expression

__init__(feature, N)

Initialize self. See help(type(self)) for accurate signature.

get_longest_back_rolling()

Get the longest length of historical data the feature has accessed

This is designed for getting the needed range of the data to calculate the features in specific range at first. However, situations like Ref(Ref($close, -1), 1) can not be handled rightly.

So this will only used for detecting the length of historical data needed.

get_extended_window_size()

get_extend_window_size

For to calculate this Operator in range[start_index, end_index] We have to get the leaf feature in range[start_index - lft_etd, end_index + rght_etd].

Returns:lft_etd, rght_etd
Return type:(int, int)
class qlib.data.ops.Mean(feature, N)

Rolling Mean (MA)

Parameters:
  • feature (Expression) – feature instance
  • N (int) – rolling window size
Returns:

a feature instance with rolling average

Return type:

Expression

__init__(feature, N)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.ops.Sum(feature, N)

Rolling Sum

Parameters:
  • feature (Expression) – feature instance
  • N (int) – rolling window size
Returns:

a feature instance with rolling sum

Return type:

Expression

__init__(feature, N)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.ops.Std(feature, N)

Rolling Std

Parameters:
  • feature (Expression) – feature instance
  • N (int) – rolling window size
Returns:

a feature instance with rolling std

Return type:

Expression

__init__(feature, N)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.ops.Var(feature, N)

Rolling Variance

Parameters:
  • feature (Expression) – feature instance
  • N (int) – rolling window size
Returns:

a feature instance with rolling variance

Return type:

Expression

__init__(feature, N)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.ops.Skew(feature, N)

Rolling Skewness

Parameters:
  • feature (Expression) – feature instance
  • N (int) – rolling window size
Returns:

a feature instance with rolling skewness

Return type:

Expression

__init__(feature, N)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.ops.Kurt(feature, N)

Rolling Kurtosis

Parameters:
  • feature (Expression) – feature instance
  • N (int) – rolling window size
Returns:

a feature instance with rolling kurtosis

Return type:

Expression

__init__(feature, N)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.ops.Max(feature, N)

Rolling Max

Parameters:
  • feature (Expression) – feature instance
  • N (int) – rolling window size
Returns:

a feature instance with rolling max

Return type:

Expression

__init__(feature, N)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.ops.IdxMax(feature, N)

Rolling Max Index

Parameters:
  • feature (Expression) – feature instance
  • N (int) – rolling window size
Returns:

a feature instance with rolling max index

Return type:

Expression

__init__(feature, N)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.ops.Min(feature, N)

Rolling Min

Parameters:
  • feature (Expression) – feature instance
  • N (int) – rolling window size
Returns:

a feature instance with rolling min

Return type:

Expression

__init__(feature, N)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.ops.IdxMin(feature, N)

Rolling Min Index

Parameters:
  • feature (Expression) – feature instance
  • N (int) – rolling window size
Returns:

a feature instance with rolling min index

Return type:

Expression

__init__(feature, N)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.ops.Quantile(feature, N, qscore)

Rolling Quantile

Parameters:
  • feature (Expression) – feature instance
  • N (int) – rolling window size
Returns:

a feature instance with rolling quantile

Return type:

Expression

__init__(feature, N, qscore)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.ops.Med(feature, N)

Rolling Median

Parameters:
  • feature (Expression) – feature instance
  • N (int) – rolling window size
Returns:

a feature instance with rolling median

Return type:

Expression

__init__(feature, N)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.ops.Mad(feature, N)

Rolling Mean Absolute Deviation

Parameters:
  • feature (Expression) – feature instance
  • N (int) – rolling window size
Returns:

a feature instance with rolling mean absolute deviation

Return type:

Expression

__init__(feature, N)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.ops.Rank(feature, N)

Rolling Rank (Percentile)

Parameters:
  • feature (Expression) – feature instance
  • N (int) – rolling window size
Returns:

a feature instance with rolling rank

Return type:

Expression

__init__(feature, N)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.ops.Count(feature, N)

Rolling Count

Parameters:
  • feature (Expression) – feature instance
  • N (int) – rolling window size
Returns:

a feature instance with rolling count of number of non-NaN elements

Return type:

Expression

__init__(feature, N)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.ops.Delta(feature, N)

Rolling Delta

Parameters:
  • feature (Expression) – feature instance
  • N (int) – rolling window size
Returns:

a feature instance with end minus start in rolling window

Return type:

Expression

__init__(feature, N)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.ops.Slope(feature, N)

Rolling Slope

Parameters:
  • feature (Expression) – feature instance
  • N (int) – rolling window size
Returns:

a feature instance with regression slope of given window

Return type:

Expression

__init__(feature, N)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.ops.Rsquare(feature, N)

Rolling R-value Square

Parameters:
  • feature (Expression) – feature instance
  • N (int) – rolling window size
Returns:

a feature instance with regression r-value square of given window

Return type:

Expression

__init__(feature, N)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.ops.Resi(feature, N)

Rolling Regression Residuals

Parameters:
  • feature (Expression) – feature instance
  • N (int) – rolling window size
Returns:

a feature instance with regression residuals of given window

Return type:

Expression

__init__(feature, N)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.ops.WMA(feature, N)

Rolling WMA

Parameters:
  • feature (Expression) – feature instance
  • N (int) – rolling window size
Returns:

a feature instance with weighted moving average output

Return type:

Expression

__init__(feature, N)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.ops.EMA(feature, N)

Rolling Exponential Mean (EMA)

Parameters:
  • feature (Expression) – feature instance
  • N (int, float) – rolling window size
Returns:

a feature instance with regression r-value square of given window

Return type:

Expression

__init__(feature, N)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.ops.PairRolling(feature_left, feature_right, N, func)

Pair Rolling Operator

Parameters:
  • feature_left (Expression) – feature instance
  • feature_right (Expression) – feature instance
  • N (int) – rolling window size
Returns:

a feature instance with rolling output of two input features

Return type:

Expression

__init__(feature_left, feature_right, N, func)

Initialize self. See help(type(self)) for accurate signature.

get_longest_back_rolling()

Get the longest length of historical data the feature has accessed

This is designed for getting the needed range of the data to calculate the features in specific range at first. However, situations like Ref(Ref($close, -1), 1) can not be handled rightly.

So this will only used for detecting the length of historical data needed.

get_extended_window_size()

get_extend_window_size

For to calculate this Operator in range[start_index, end_index] We have to get the leaf feature in range[start_index - lft_etd, end_index + rght_etd].

Returns:lft_etd, rght_etd
Return type:(int, int)
class qlib.data.ops.Corr(feature_left, feature_right, N)

Rolling Correlation

Parameters:
  • feature_left (Expression) – feature instance
  • feature_right (Expression) – feature instance
  • N (int) – rolling window size
Returns:

a feature instance with rolling correlation of two input features

Return type:

Expression

__init__(feature_left, feature_right, N)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.ops.Cov(feature_left, feature_right, N)

Rolling Covariance

Parameters:
  • feature_left (Expression) – feature instance
  • feature_right (Expression) – feature instance
  • N (int) – rolling window size
Returns:

a feature instance with rolling max of two input features

Return type:

Expression

__init__(feature_left, feature_right, N)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.ops.OpsWrapper

Ops Wrapper

__init__()

Initialize self. See help(type(self)) for accurate signature.

qlib.data.ops.register_all_ops(C)

register all operator

Cache

class qlib.data.cache.MemCacheUnit(*args, **kwargs)

Memory Cache Unit.

__init__(*args, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

limited

whether memory cache is limited

class qlib.data.cache.MemCache(mem_cache_size_limit=None, limit_type='length')

Memory cache.

__init__(mem_cache_size_limit=None, limit_type='length')
Parameters:
  • mem_cache_size_limit (cache max size.) –
  • limit_type (length or sizeof; length(call fun: len), size(call fun: sys.getsizeof)) –
class qlib.data.cache.ExpressionCache(provider)

Expression cache mechanism base class.

This class is used to wrap expression provider with self-defined expression cache mechanism.

Note

Override the _uri and _expression method to create your own expression cache mechanism.

expression(instrument, field, start_time, end_time, freq)

Get expression data.

Note

Same interface as expression method in expression provider

update(cache_uri)

Update expression cache to latest calendar.

Overide this method to define how to update expression cache corresponding to users’ own cache mechanism.

Parameters:cache_uri (str) – the complete uri of expression cache file (include dir path).
Returns:0(successful update)/ 1(no need to update)/ 2(update failure).
Return type:int
class qlib.data.cache.DatasetCache(provider)

Dataset cache mechanism base class.

This class is used to wrap dataset provider with self-defined dataset cache mechanism.

Note

Override the _uri and _dataset method to create your own dataset cache mechanism.

dataset(instruments, fields, start_time=None, end_time=None, freq='day', disk_cache=1)

Get feature dataset.

Note

Same interface as dataset method in dataset provider

Note

The server use redis_lock to make sure read-write conflicts will not be triggered

but client readers are not considered.
update(cache_uri)

Update dataset cache to latest calendar.

Overide this method to define how to update dataset cache corresponding to users’ own cache mechanism.

Parameters:cache_uri (str) – the complete uri of dataset cache file (include dir path).
Returns:0(successful update)/ 1(no need to update)/ 2(update failure)
Return type:int
static cache_to_origin_data(data, fields)

cache data to origin data

Parameters:
  • data – pd.DataFrame, cache data.
  • fields – feature fields.
Returns:

pd.DataFrame.

static normalize_uri_args(instruments, fields, freq)

normalize uri args

class qlib.data.cache.DiskExpressionCache(provider, **kwargs)

Prepared cache mechanism for server.

__init__(provider, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

gen_expression_cache(expression_data, cache_path, instrument, field, freq, last_update)

use bin file to save like feature-data.

update(sid, cache_uri)

Update expression cache to latest calendar.

Overide this method to define how to update expression cache corresponding to users’ own cache mechanism.

Parameters:cache_uri (str) – the complete uri of expression cache file (include dir path).
Returns:0(successful update)/ 1(no need to update)/ 2(update failure).
Return type:int
class qlib.data.cache.DiskDatasetCache(provider, **kwargs)

Prepared cache mechanism for server.

__init__(provider, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

classmethod read_data_from_cache(cache_path, start_time, end_time, fields)

read_cache_from

This function can read data from the disk cache dataset

Parameters:
  • cache_path
  • start_time
  • end_time
  • fields – The fields order of the dataset cache is sorted. So rearrange the columns to make it consistent.
Returns:

class IndexManager(cache_path)

The lock is not considered in the class. Please consider the lock outside the code. This class is the proxy of the disk data.

__init__(cache_path)

Initialize self. See help(type(self)) for accurate signature.

gen_dataset_cache(cache_path, instruments, fields, freq)

Note

This function does not consider the cache read write lock. Please

Aquire the lock outside this function

The format the cache contains 3 parts(followed by typical filename).

  • index : cache/d41366901e25de3ec47297f12e2ba11d.index

    • The content of the file may be in following format(pandas.Series)

                          start end
      1999-11-10 00:00:00     0   1
      1999-11-11 00:00:00     1   2
      1999-11-12 00:00:00     2   3
      ...
      

    Note

    The start is closed. The end is open!!!!!

    • Each line contains two element <start_index, end_index> with a timestamp as its index.
    • It indicates the start_index`(included) and `end_index`(excluded) of the data for `timestamp
  • meta data: cache/d41366901e25de3ec47297f12e2ba11d.meta

  • data : cache/d41366901e25de3ec47297f12e2ba11d

    • This is a hdf file sorted by datetime
Parameters:
  • cache_path – The path to store the cache.
  • instruments – The instruments to store the cache.
  • fields – The fields to store the cache.
  • freq – The freq to store the cache.

:return type pd.DataFrame; The fields of the returned DataFrame are consistent with the parameters of the function.

update(cache_uri)

Update dataset cache to latest calendar.

Overide this method to define how to update dataset cache corresponding to users’ own cache mechanism.

Parameters:cache_uri (str) – the complete uri of dataset cache file (include dir path).
Returns:0(successful update)/ 1(no need to update)/ 2(update failure)
Return type:int

Dataset

Dataset Class

class qlib.data.dataset.__init__.Dataset(*args, **kwargs)

Preparing data for model training and inferencing.

__init__(*args, **kwargs)

init is designed to finish following steps:

  • setup data
    • The data related attributes’ names should start with ‘_’ so that it will not be saved on disk when serializing.
  • initialize the state of the dataset(info to prepare the data)
    • The name of essential state for preparing data should not start with ‘_’ so that it could be serialized on disk when serializing.

The data could specify the info to caculate the essential data for preparation

setup_data(*args, **kwargs)

Setup the data.

We split the setup_data function for following situation:

  • User have a Dataset object with learned status on disk.
  • User load the Dataset object from the disk(Note the init function is skiped).
  • User call setup_data to load new data.
  • User prepare data for model based on previous status.
prepare(*args, **kwargs) → object

The type of dataset depends on the model. (It could be pd.DataFrame, pytorch.DataLoader, etc.) The parameters should specify the scope for the prepared data The method should: - process the data

  • return the processed data
Returns:return the object
Return type:object
class qlib.data.dataset.__init__.DatasetH(handler: Union[dict, qlib.data.dataset.handler.DataHandler], segments: dict)

Dataset with Data(H)andler

User should try to put the data preprocessing functions into handler. Only following data processing functions should be placed in Dataset:

  • The processing is related to specific model.
  • The processing is related to data split.
__init__(handler: Union[dict, qlib.data.dataset.handler.DataHandler], segments: dict)
Parameters:
  • handler (Union[dict, DataHandler]) – handler will be passed into setup_data.
  • segments (dict) – handler will be passed into setup_data.
init(**kwargs)

Initialize the DatasetH, Only parameters belonging to handler.init will be passed in

setup_data(handler: Union[dict, qlib.data.dataset.handler.DataHandler], segments: dict)

Setup the underlying data.

Parameters:
  • handler (Union[dict, DataHandler]) –

    handler could be:

    • insntance of DataHandler
    • config of DataHandler. Please refer to DataHandler
  • segments (dict) – Describe the options to segment the data. Here are some examples:
prepare(segments: Union[List[str], Tuple[str], str, slice], col_set='__all', data_key='infer', **kwargs) → Union[List[pandas.core.frame.DataFrame], pandas.core.frame.DataFrame]

Prepare the data for learning and inference.

Parameters:
  • segments (Union[List[str], Tuple[str], str, slice]) –

    Describe the scope of the data to be prepared Here are some examples:

    • ’train’
    • [‘train’, ‘valid’]
  • col_set (str) – The col_set will be passed to self.handler when fetching data.
  • data_key (str) – The data to fetch: DK_* Default is DK_I, which indicate fetching data for inference.
Returns:

Return type:

Union[List[pd.DataFrame], pd.DataFrame]

Raises:

NotImplementedError:

class qlib.data.dataset.__init__.TSDataSampler(data: pandas.core.frame.DataFrame, start, end, step_len: int, fillna_type: str = 'none')

(T)ime-(S)eries DataSampler This is the result of TSDatasetH

It works like torch.data.utils.Dataset, it provides a very convient interface for constructing time-series dataset based on tabular data.

If user have further requirements for processing data, user could process them based on TSDataSampler or create more powerful subclasses.

Known Issues: - For performance issues, this Sampler will convert dataframe into arrays for better performance. This could result

in a different data type
__init__(data: pandas.core.frame.DataFrame, start, end, step_len: int, fillna_type: str = 'none')

Build a dataset which looks like torch.data.utils.Dataset.

Parameters:
  • data (pd.DataFrame) – The raw tabular data
  • start – The indexable start time
  • end – The indexable end time
  • step_len (int) – The length of the time-series step
  • fillna_type (int) –

    How will qlib handle the sample if there is on sample in a specific date. none:

    fill with np.nan
    ffill:
    ffill with previous sample
    ffill+bfill:
    ffill with previous samples first and fill with later samples second
get_index()

Get the pandas index of the data, it will be useful in following scenarios - Special sampler will be used (e.g. user want to sample day by day)

static build_index(data: pandas.core.frame.DataFrame) → dict

The relation of the data

Parameters:data (pd.DataFrame) – The dataframe with <datetime, DataFrame>
Returns:{<index>: <prev_index or None>} # get the previous index of a line given index
Return type:dict
class qlib.data.dataset.__init__.TSDatasetH(step_len=30, *args, **kwargs)

(T)ime-(S)eries Dataset (H)andler

Covnert the tabular data to Time-Series data

Requirements analysis

The typical workflow of a user to get time-series data for an sample - process features - slice proper data from data handler: dimension of sample <feature, > - Build relation of samples by <time, instrument> index

  • Be able to sample times series of data <timestep, feature>
  • It will be better if the interface is like “torch.utils.data.Dataset”
  • User could build customized batch based on the data
    • The dimension of a batch of data <batch_idx, feature, timestep>
__init__(step_len=30, *args, **kwargs)
Parameters:
  • handler (Union[dict, DataHandler]) – handler will be passed into setup_data.
  • segments (dict) – handler will be passed into setup_data.
setup_data(*args, **kwargs)

Setup the underlying data.

Parameters:
  • handler (Union[dict, DataHandler]) –

    handler could be:

    • insntance of DataHandler
    • config of DataHandler. Please refer to DataHandler
  • segments (dict) – Describe the options to segment the data. Here are some examples:

Data Loader

class qlib.data.dataset.loader.DataLoader

DataLoader is designed for loading raw data from original data source.

load(instruments, start_time=None, end_time=None) → pandas.core.frame.DataFrame

load the data as pd.DataFrame.

Example of the data (The multi-index of the columns is optional.):

                        feature                                                             label
                        $close     $volume     Ref($close, 1)  Mean($close, 3)  $high-$low  LABEL0
datetime    instrument
2010-01-04  SH600000    81.807068  17145150.0       83.737389        83.016739    2.741058  0.0032
            SH600004    13.313329  11800983.0       13.313329        13.317701    0.183632  0.0042
            SH600005    37.796539  12231662.0       38.258602        37.919757    0.970325  0.0289
Parameters:
  • instruments (str or dict) – it can either be the market name or the config file of instruments generated by InstrumentProvider.
  • start_time (str) – start of the time range.
  • end_time (str) – end of the time range.
Returns:

data load from the under layer source

Return type:

pd.DataFrame

class qlib.data.dataset.loader.DLWParser(config: Tuple[list, tuple, dict])

(D)ata(L)oader (W)ith (P)arser for features and names

Extracting this class so that QlibDataLoader and other dataloaders(such as QdbDataLoader) can share the fields.

__init__(config: Tuple[list, tuple, dict])
Parameters:config (Tuple[list, tuple, dict]) – Config will be used to describe the fields and column names
load_group_df(instruments, exprs: list, names: list, start_time=None, end_time=None) → pandas.core.frame.DataFrame

load the dataframe for specific group

Parameters:
  • instruments – the instruments.
  • exprs (list) – the expressions to describe the content of the data.
  • names (list) – the name of the data.
Returns:

the queried dataframe.

Return type:

pd.DataFrame

load(instruments=None, start_time=None, end_time=None) → pandas.core.frame.DataFrame

load the data as pd.DataFrame.

Example of the data (The multi-index of the columns is optional.):

                        feature                                                             label
                        $close     $volume     Ref($close, 1)  Mean($close, 3)  $high-$low  LABEL0
datetime    instrument
2010-01-04  SH600000    81.807068  17145150.0       83.737389        83.016739    2.741058  0.0032
            SH600004    13.313329  11800983.0       13.313329        13.317701    0.183632  0.0042
            SH600005    37.796539  12231662.0       38.258602        37.919757    0.970325  0.0289
Parameters:
  • instruments (str or dict) – it can either be the market name or the config file of instruments generated by InstrumentProvider.
  • start_time (str) – start of the time range.
  • end_time (str) – end of the time range.
Returns:

data load from the under layer source

Return type:

pd.DataFrame

class qlib.data.dataset.loader.QlibDataLoader(config: Tuple[list, tuple, dict], filter_pipe=None, swap_level=True, freq='day')

Same as QlibDataLoader. The fields can be define by config

__init__(config: Tuple[list, tuple, dict], filter_pipe=None, swap_level=True, freq='day')
Parameters:
  • config (Tuple[list, tuple, dict]) – Please refer to the doc of DLWParser
  • filter_pipe – Filter pipe for the instruments
  • swap_level – Whether to swap level of MultiIndex
load_group_df(instruments, exprs: list, names: list, start_time=None, end_time=None) → pandas.core.frame.DataFrame

load the dataframe for specific group

Parameters:
  • instruments – the instruments.
  • exprs (list) – the expressions to describe the content of the data.
  • names (list) – the name of the data.
Returns:

the queried dataframe.

Return type:

pd.DataFrame

class qlib.data.dataset.loader.StaticDataLoader(config: dict, join='outer')

DataLoader that supports loading data from file or as provided.

__init__(config: dict, join='outer')
Parameters:
  • config (dict) – {fields_group: <path or object>}
  • join (str) – How to align different dataframes
load(instruments=None, start_time=None, end_time=None) → pandas.core.frame.DataFrame

load the data as pd.DataFrame.

Example of the data (The multi-index of the columns is optional.):

                        feature                                                             label
                        $close     $volume     Ref($close, 1)  Mean($close, 3)  $high-$low  LABEL0
datetime    instrument
2010-01-04  SH600000    81.807068  17145150.0       83.737389        83.016739    2.741058  0.0032
            SH600004    13.313329  11800983.0       13.313329        13.317701    0.183632  0.0042
            SH600005    37.796539  12231662.0       38.258602        37.919757    0.970325  0.0289
Parameters:
  • instruments (str or dict) – it can either be the market name or the config file of instruments generated by InstrumentProvider.
  • start_time (str) – start of the time range.
  • end_time (str) – end of the time range.
Returns:

data load from the under layer source

Return type:

pd.DataFrame

Data Handler

class qlib.data.dataset.handler.DataHandler(instruments=None, start_time=None, end_time=None, data_loader: Tuple[dict, str, qlib.data.dataset.loader.DataLoader] = None, init_data=True, fetch_orig=True)

The steps to using a handler 1. initialized data handler (call by init). 2. use the data.

The data handler try to maintain a handler with 2 level. datetime & instruments.

Any order of the index level can be suported(The order will implied in the data). The order <datetime, instruments> will be used when the dataframe index name is missed.

Example of the data: The multi-index of the columns is optional.

                        feature                                                            label
                        $close     $volume  Ref($close, 1)  Mean($close, 3)  $high-$low  LABEL0
datetime   instrument
2010-01-04 SH600000    81.807068  17145150.0       83.737389        83.016739    2.741058  0.0032
        SH600004    13.313329  11800983.0       13.313329        13.317701    0.183632  0.0042
        SH600005    37.796539  12231662.0       38.258602        37.919757    0.970325  0.0289
__init__(instruments=None, start_time=None, end_time=None, data_loader: Tuple[dict, str, qlib.data.dataset.loader.DataLoader] = None, init_data=True, fetch_orig=True)
Parameters:
  • instruments – The stock list to retrive.
  • start_time – start_time of the original data.
  • end_time – end_time of the original data.
  • data_loader (Tuple[dict, str, DataLoader]) – data loader to load the data.
  • init_data – intialize the original data in the constructor.
  • fetch_orig (bool) – Return the original data instead of copy if possible.
conf_data(**kwargs)

configuration of data. # what data to be loaded from data source

This method will be used when loading pickled handler from dataset. The data will be initialized with different time range.

init(enable_cache: bool = False)

initialize the data. In case of running intialization for multiple time, it will do nothing for the second time.

It is responsible for maintaining following variable 1) self._data

Parameters:enable_cache (bool) –

default value is false:

  • if enable_cache == True:
    the processed data will be saved on disk, and handler will load the cached data from the disk directly when we call init next time
fetch(selector: Union[pandas._libs.tslibs.timestamps.Timestamp, slice, str] = slice(None, None, None), level: Union[str, int] = 'datetime', col_set: Union[str, List[str]] = '__all', squeeze: bool = False) → pandas.core.frame.DataFrame

fetch data from underlying data source

Parameters:
  • selector (Union[pd.Timestamp, slice, str]) – describe how to select data by index
  • level (Union[str, int]) – which index level to select the data
  • col_set (Union[str, List[str]]) –
    • if isinstance(col_set, str):
      select a set of meaningful columns.(e.g. features, columns)
      if cal_set == CS_RAW:
      the raw dataset will be returned.
    • if isinstance(col_set, List[str]):
      select several sets of meaningful columns, the returned data has multiple levels
  • squeeze (bool) – whether squeeze columns and index
Returns:

Return type:

pd.DataFrame.

get_cols(col_set='__all') → list

get the column names

Parameters:col_set (str) – select a set of meaningful columns.(e.g. features, columns)
Returns:list of column names
Return type:list
get_range_selector(cur_date: Union[pandas._libs.tslibs.timestamps.Timestamp, str], periods: int) → slice

get range selector by number of periods

Parameters:
  • cur_date (pd.Timestamp or str) – current date
  • periods (int) – number of periods
get_range_iterator(periods: int, min_periods: Optional[int] = None, **kwargs) → Iterator[Tuple[pandas._libs.tslibs.timestamps.Timestamp, pandas.core.frame.DataFrame]]

get a iterator of sliced data with given periods

Parameters:
  • periods (int) – number of periods.
  • min_periods (int) – minimum periods for sliced dataframe.
  • kwargs (dict) – will be passed to self.fetch.
class qlib.data.dataset.handler.DataHandlerLP(instruments=None, start_time=None, end_time=None, data_loader: Tuple[dict, str, qlib.data.dataset.loader.DataLoader] = None, infer_processors=[], learn_processors=[], process_type='append', drop_raw=False, **kwargs)

DataHandler with (L)earnable (P)rocessor

__init__(instruments=None, start_time=None, end_time=None, data_loader: Tuple[dict, str, qlib.data.dataset.loader.DataLoader] = None, infer_processors=[], learn_processors=[], process_type='append', drop_raw=False, **kwargs)
Parameters:
  • infer_processors (list) –
    • list of <description info> of processors to generate data for inference
    • example of <description info>:
  • learn_processors (list) – similar to infer_processors, but for generating data for learning models
  • process_type (str) –

    PTYPE_I = ‘independent’

    • self._infer will processed by infer_processors
    • self._learn will be processed by learn_processors

    PTYPE_A = ‘append’

    • self._infer will processed by infer_processors
    • self._learn will be processed by infer_processors + learn_processors
      • (e.g. self._infer processed by learn_processors )
  • drop_raw (bool) – Whether to drop the raw data
fit_process_data()

fit and process data

The input of the fit will be the output of the previous processor

process_data(with_fit: bool = False)

process_data data. Fun processor.fit if necessary

Parameters:with_fit (bool) – The input of the fit will be the output of the previous processor
init(init_type: str = 'fit_seq', enable_cache: bool = False)

Initialize the data of Qlib

Parameters:
  • init_type (str) – The type IT_* listed above.
  • enable_cache (bool) –

    default value is false:

    • if enable_cache == True:
      the processed data will be saved on disk, and handler will load the cached data from the disk directly when we call init next time
fetch(selector: Union[pandas._libs.tslibs.timestamps.Timestamp, slice, str] = slice(None, None, None), level: Union[str, int] = 'datetime', col_set='__all', data_key: str = 'infer') → pandas.core.frame.DataFrame

fetch data from underlying data source

Parameters:
  • selector (Union[pd.Timestamp, slice, str]) – describe how to select data by index.
  • level (Union[str, int]) – which index level to select the data.
  • col_set (str) – select a set of meaningful columns.(e.g. features, columns).
  • data_key (str) – the data to fetch: DK_*.
Returns:

Return type:

pd.DataFrame

get_cols(col_set='__all', data_key: str = 'infer') → list

get the column names

Parameters:
  • col_set (str) – select a set of meaningful columns.(e.g. features, columns).
  • data_key (str) – the data to fetch: DK_*.
Returns:

list of column names

Return type:

list

Processor

qlib.data.dataset.processor.get_group_columns(df: pandas.core.frame.DataFrame, group: str)

get a group of columns from multi-index columns DataFrame

Parameters:
  • df (pd.DataFrame) – with multi of columns.
  • group (str) – the name of the feature group, i.e. the first level value of the group index.
class qlib.data.dataset.processor.Processor
fit(df: pandas.core.frame.DataFrame = None)

learn data processing parameters

Parameters:df (pd.DataFrame) – When we fit and process data with processor one by one. The fit function reiles on the output of previous processor, i.e. df.
is_for_infer() → bool

Is this processor usable for inference Some processors are not usable for inference.

Returns:if it is usable for infenrece.
Return type:bool
class qlib.data.dataset.processor.DropnaProcessor(fields_group=None)
__init__(fields_group=None)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.dataset.processor.DropnaLabel(fields_group='label')
__init__(fields_group='label')

Initialize self. See help(type(self)) for accurate signature.

is_for_infer() → bool

The samples are dropped according to label. So it is not usable for inference

class qlib.data.dataset.processor.DropCol(col_list=[])
__init__(col_list=[])

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.dataset.processor.FilterCol(fields_group='feature', col_list=[])
__init__(fields_group='feature', col_list=[])

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.dataset.processor.TanhProcess

Use tanh to process noise data

class qlib.data.dataset.processor.ProcessInf

Process infinity

class qlib.data.dataset.processor.Fillna(fields_group=None, fill_value=0)

Process NaN

__init__(fields_group=None, fill_value=0)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.dataset.processor.MinMaxNorm(fit_start_time, fit_end_time, fields_group=None)
__init__(fit_start_time, fit_end_time, fields_group=None)

Initialize self. See help(type(self)) for accurate signature.

fit(df)

learn data processing parameters

Parameters:df (pd.DataFrame) – When we fit and process data with processor one by one. The fit function reiles on the output of previous processor, i.e. df.
class qlib.data.dataset.processor.ZScoreNorm(fit_start_time, fit_end_time, fields_group=None)

ZScore Normalization

__init__(fit_start_time, fit_end_time, fields_group=None)

Initialize self. See help(type(self)) for accurate signature.

fit(df)

learn data processing parameters

Parameters:df (pd.DataFrame) – When we fit and process data with processor one by one. The fit function reiles on the output of previous processor, i.e. df.
class qlib.data.dataset.processor.RobustZScoreNorm(fit_start_time, fit_end_time, fields_group=None, clip_outlier=True)

Robust ZScore Normalization

Use robust statistics for Z-Score normalization:
mean(x) = median(x) std(x) = MAD(x) * 1.4826
Reference:
https://en.wikipedia.org/wiki/Median_absolute_deviation.
__init__(fit_start_time, fit_end_time, fields_group=None, clip_outlier=True)

Initialize self. See help(type(self)) for accurate signature.

fit(df)

learn data processing parameters

Parameters:df (pd.DataFrame) – When we fit and process data with processor one by one. The fit function reiles on the output of previous processor, i.e. df.
class qlib.data.dataset.processor.CSZScoreNorm(fields_group=None)

Cross Sectional ZScore Normalization

__init__(fields_group=None)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.dataset.processor.CSRankNorm(fields_group=None)

Cross Sectional Rank Normalization

__init__(fields_group=None)

Initialize self. See help(type(self)) for accurate signature.

class qlib.data.dataset.processor.CSZFillna(fields_group=None)

Cross Sectional Fill Nan

__init__(fields_group=None)

Initialize self. See help(type(self)) for accurate signature.

Contrib

Model

class qlib.model.base.BaseModel

Modeling things

predict(*args, **kwargs) → object

Make predictions after modeling things

class qlib.model.base.Model

Learnable Models

fit(dataset: qlib.data.dataset.Dataset)

Learn model from the base model

Note

The attribute names of learned model should not start with ‘_’. So that the model could be dumped to disk.

The following code example shows how to retrieve x_train, y_train and w_train from the dataset:

# get features and labels
df_train, df_valid = dataset.prepare(
    ["train", "valid"], col_set=["feature", "label"], data_key=DataHandlerLP.DK_L
)
x_train, y_train = df_train["feature"], df_train["label"]
x_valid, y_valid = df_valid["feature"], df_valid["label"]

# get weights
try:
    wdf_train, wdf_valid = dataset.prepare(["train", "valid"], col_set=["weight"], data_key=DataHandlerLP.DK_L)
    w_train, w_valid = wdf_train["weight"], wdf_valid["weight"]
except KeyError as e:
    w_train = pd.DataFrame(np.ones_like(y_train.values), index=y_train.index)
    w_valid = pd.DataFrame(np.ones_like(y_valid.values), index=y_valid.index)
Parameters:dataset (Dataset) – dataset will generate the processed data from model training.
predict(dataset: qlib.data.dataset.Dataset) → object

give prediction given Dataset

Parameters:dataset (Dataset) – dataset will generate the processed dataset from model training.
Returns:
Return type:Prediction results with certain type such as pandas.Series.
class qlib.model.base.ModelFT

Model (F)ine(t)unable

finetune(dataset: qlib.data.dataset.Dataset)

finetune model based given dataset

A typical use case of finetuning model with qlib.workflow.R

# start exp to train init model
with R.start(experiment_name="init models"):
    model.fit(dataset)
    R.save_objects(init_model=model)
    rid = R.get_recorder().id

# Finetune model based on previous trained model
with R.start(experiment_name="finetune model"):
    recorder = R.get_recorder(rid, experiment_name="init models")
    model = recorder.load_object("init_model")
    model.finetune(dataset, num_boost_round=10)
Parameters:dataset (Dataset) – dataset will generate the processed dataset from model training.

Strategy

class qlib.contrib.strategy.strategy.StrategyWrapper(inner_strategy)

StrategyWrapper is a wrapper of another strategy. By overriding some methods to make some changes on the basic strategy Cost control and risk control will base on this class.

__init__(inner_strategy)
Parameters:inner_strategy – set the inner strategy.
class qlib.contrib.strategy.strategy.AdjustTimer

Responsible for timing of position adjusting

This is designed as multiple inheritance mechanism due to: - the is_adjust may need access to the internel state of a strategy.

  • it can be reguard as a enhancement to the existing strategy.
is_adjust(trade_date)

Return if the strategy can adjust positions on trade_date Will normally be used in strategy do trading with trade frequency

class qlib.contrib.strategy.strategy.ListAdjustTimer(adjust_dates=None)
__init__(adjust_dates=None)
Parameters:adjust_dates – an iterable object, it will return a timelist for trading dates
is_adjust(trade_date)

Return if the strategy can adjust positions on trade_date Will normally be used in strategy do trading with trade frequency

class qlib.contrib.strategy.strategy.WeightStrategyBase(order_generator_cls_or_obj=<class 'qlib.contrib.strategy.order_generator.OrderGenWInteract'>, *args, **kwargs)
__init__(order_generator_cls_or_obj=<class 'qlib.contrib.strategy.order_generator.OrderGenWInteract'>, *args, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

generate_target_weight_position(score, current, trade_date)

Generate target position from score for this date and the current position.The cash is not considered in the position

Parameters:
  • score (pd.Series) – pred score for this trade date, index is stock_id, contain ‘score’ column.
  • current (Position()) – current position.
  • trade_exchange (Exchange()) –
  • trade_date (pd.Timestamp) – trade date.
generate_order_list(score_series, current, trade_exchange, pred_date, trade_date)
Parameters:
  • score_series (pd.Seires) – stock_id , score.
  • current (Position()) – current of account.
  • trade_exchange (Exchange()) – exchange.
  • trade_date (pd.Timestamp) – date.
class qlib.contrib.strategy.strategy.TopkDropoutStrategy(topk, n_drop, method_sell='bottom', method_buy='top', risk_degree=0.95, thresh=1, hold_thresh=1, only_tradable=False, **kwargs)
__init__(topk, n_drop, method_sell='bottom', method_buy='top', risk_degree=0.95, thresh=1, hold_thresh=1, only_tradable=False, **kwargs)
Parameters:
  • topk (int) – the number of stocks in the portfolio.
  • n_drop (int) – number of stocks to be replaced in each trading date.
  • method_sell (str) – dropout method_sell, random/bottom.
  • method_buy (str) – dropout method_buy, random/top.
  • risk_degree (float) – position percentage of total value.
  • thresh (int) – minimun holding days since last buy singal of the stock.
  • hold_thresh (int) – minimum holding days before sell stock , will check current.get_stock_count(order.stock_id) >= self.thresh.
  • only_tradable (bool) –

    will the strategy only consider the tradable stock when buying and selling. if only_tradable:

    strategy will make buy sell decision without checking the tradable state of the stock.
    else:
    strategy will make decision with the tradable state of the stock info and avoid buy and sell them.
get_risk_degree(date)

Return the proportion of your total value you will used in investment. Dynamically risk_degree will result in Market timing.

generate_order_list(score_series, current, trade_exchange, pred_date, trade_date)

Gnererate order list according to score_series at trade_date, will not change current.

Parameters:
  • score_series (pd.Series) – stock_id , score.
  • current (Position()) – current of account.
  • trade_exchange (Exchange()) – exchange.
  • pred_date (pd.Timestamp) – predict date.
  • trade_date (pd.Timestamp) – trade date.

Evaluate

qlib.contrib.evaluate.risk_analysis(r, N=252)

Risk Analysis

Parameters:
  • r (pandas.Series) – daily return series.
  • N (int) – scaler for annualizing information_ratio (day: 250, week: 50, month: 12).
qlib.contrib.evaluate.backtest(pred, account=1000000000.0, shift=1, benchmark='SH000905', verbose=True, **kwargs)

This function will help you set a reasonable Exchange and provide default value for strategy :param - backtest workflow related or commmon arguments: :param pred: predict should has <datetime, instrument> index and one score column. :type pred: pandas.DataFrame :param account: init account value. :type account: float :param shift: whether to shift prediction by one day. :type shift: int :param benchmark: benchmark code, default is SH000905 CSI 500. :type benchmark: str :param verbose: whether to print log. :type verbose: bool :param - strategy related arguments: :param strategy: strategy used in backtest. :type strategy: Strategy() :param topk: top-N stocks to buy. :type topk: int (Default value: 50) :param margin:

  • if isinstance(margin, int):

    sell_limit = margin

  • else:

    sell_limit = pred_in_a_day.count() * margin

buffer margin, in single score_mode, continue holding stock if it is in nlargest(sell_limit). sell_limit should be no less than topk.

Parameters:
  • n_drop (int) – number of stocks to be replaced in each trading date.
  • risk_degree (float) – 0-1, 0.95 for example, use 95% money to trade.
  • str_type ('amount', 'weight' or 'dropout') – strategy type: TopkAmountStrategy ,TopkWeightStrategy or TopkDropoutStrategy.
  • exchange related arguments (-) –
  • exchange (Exchange()) – pass the exchange for speeding up.
  • subscribe_fields (list) – subscribe fields.
  • open_cost (float) – open transaction cost. The default value is 0.002(0.2%).
  • close_cost (float) – close transaction cost. The default value is 0.002(0.2%).
  • min_cost (float) – min transaction cost.
  • trade_unit (int) – 100 for China A.
  • deal_price (str) – dealing price type: ‘close’, ‘open’, ‘vwap’.
  • limit_threshold (float) – limit move 0.1 (10%) for example, long and short with same limit.
  • extract_codes (bool) –

    will we pass the codes extracted from the pred to the exchange.

    Note

    This will be faster with offline qlib.

  • executor related arguments (-) –
  • executor (BaseExecutor()) – executor used in backtest.
  • verbose (bool) – whether to print log.
qlib.contrib.evaluate.long_short_backtest(pred, topk=50, deal_price=None, shift=1, open_cost=0, close_cost=0, trade_unit=None, limit_threshold=None, min_cost=5, subscribe_fields=[], extract_codes=False)

A backtest for long-short strategy

Parameters:
  • pred – The trading signal produced on day T.
  • topk – The short topk securities and long topk securities.
  • deal_price – The price to deal the trading.
  • shift – Whether to shift prediction by one day. The trading day will be T+1 if shift==1.
  • open_cost – open transaction cost.
  • close_cost – close transaction cost.
  • trade_unit – 100 for China A.
  • limit_threshold – limit move 0.1 (10%) for example, long and short with same limit.
  • min_cost – min transaction cost.
  • subscribe_fields – subscribe fields.
  • extract_codes – bool. will we pass the codes extracted from the pred to the exchange. NOTE: This will be faster with offline qlib.
Returns:

The result of backtest, it is represented by a dict. { “long”: long_returns(excess),

”short”: short_returns(excess), “long_short”: long_short_returns}

Report

qlib.contrib.report.analysis_position.report.report_graph(report_df: pandas.core.frame.DataFrame, show_notebook: bool = True) → [<class 'list'>, <class 'tuple'>]

display backtest report

Example:

from qlib.contrib.evaluate import backtest
from qlib.contrib.strategy import TopkDropoutStrategy

# backtest parameters
bparas = {}
bparas['limit_threshold'] = 0.095
bparas['account'] = 1000000000

sparas = {}
sparas['topk'] = 50
sparas['n_drop'] = 230
strategy = TopkDropoutStrategy(**sparas)

report_normal_df, _ = backtest(pred_df, strategy, **bparas)

qcr.report_graph(report_normal_df)
Parameters:
  • report_df

    df.index.name must be date, df.columns must contain return, turnover, cost, bench.

                return      cost        bench       turnover
    date
    2017-01-04  0.003421    0.000864    0.011693    0.576325
    2017-01-05  0.000508    0.000447    0.000721    0.227882
    2017-01-06  -0.003321   0.000212    -0.004322   0.102765
    2017-01-09  0.006753    0.000212    0.006874    0.105864
    2017-01-10  -0.000416   0.000440    -0.003350   0.208396
    
  • show_notebook – whether to display graphics in notebook, the default is True.
Returns:

if show_notebook is True, display in notebook; else return plotly.graph_objs.Figure list.

qlib.contrib.report.analysis_position.score_ic.score_ic_graph(pred_label: pandas.core.frame.DataFrame, show_notebook: bool = True) → [<class 'list'>, <class 'tuple'>]

score IC

Example:

from qlib.data import D
from qlib.contrib.report import analysis_position
pred_df_dates = pred_df.index.get_level_values(level='datetime')
features_df = D.features(D.instruments('csi500'), ['Ref($close, -2)/Ref($close, -1)-1'], pred_df_dates.min(), pred_df_dates.max())
features_df.columns = ['label']
pred_label = pd.concat([features_df, pred], axis=1, sort=True).reindex(features_df.index)
analysis_position.score_ic_graph(pred_label)
Parameters:
  • pred_label

    index is pd.MultiIndex, index name is [instrument, datetime]; columns names is [score, label].

    instrument  datetime        score         label
    SH600004  2017-12-11     -0.013502       -0.013502
                2017-12-12   -0.072367       -0.072367
                2017-12-13   -0.068605       -0.068605
                2017-12-14    0.012440        0.012440
                2017-12-15   -0.102778       -0.102778
    
  • show_notebook – whether to display graphics in notebook, the default is True.
Returns:

if show_notebook is True, display in notebook; else return plotly.graph_objs.Figure list.

qlib.contrib.report.analysis_position.cumulative_return.cumulative_return_graph(position: dict, report_normal: pandas.core.frame.DataFrame, label_data: pandas.core.frame.DataFrame, show_notebook=True, start_date=None, end_date=None) → Iterable[plotly.graph_objs._figure.Figure]

Backtest buy, sell, and holding cumulative return graph

Example:

from qlib.data import D
from qlib.contrib.evaluate import risk_analysis, backtest, long_short_backtest
from qlib.contrib.strategy import TopkDropoutStrategy

# backtest parameters
bparas = {}
bparas['limit_threshold'] = 0.095
bparas['account'] = 1000000000

sparas = {}
sparas['topk'] = 50
sparas['n_drop'] = 5
strategy = TopkDropoutStrategy(**sparas)

report_normal_df, positions = backtest(pred_df, strategy, **bparas)

pred_df_dates = pred_df.index.get_level_values(level='datetime')
features_df = D.features(D.instruments('csi500'), ['Ref($close, -1)/$close - 1'], pred_df_dates.min(), pred_df_dates.max())
features_df.columns = ['label']

qcr.cumulative_return_graph(positions, report_normal_df, features_df)
Graph desc:
  • Axis X: Trading day.
  • Axis Y:
  • Above axis Y: (((Ref($close, -1)/$close - 1) * weight).sum() / weight.sum()).cumsum().
  • Below axis Y: Daily weight sum.
  • In the sell graph, y < 0 stands for profit; in other cases, y > 0 stands for profit.
  • In the buy_minus_sell graph, the y value of the weight graph at the bottom is buy_weight + sell_weight.
  • In each graph, the red line in the histogram on the right represents the average.
Parameters:
  • position – position data
  • report_normal
                    return      cost        bench       turnover
    date
    2017-01-04  0.003421    0.000864    0.011693    0.576325
    2017-01-05  0.000508    0.000447    0.000721    0.227882
    2017-01-06  -0.003321   0.000212    -0.004322   0.102765
    2017-01-09  0.006753    0.000212    0.006874    0.105864
    2017-01-10  -0.000416   0.000440    -0.003350   0.208396
    
  • label_dataD.features result; index is pd.MultiIndex, index name is [instrument, datetime]; columns names is [label].

The label T is the change from T to T+1, it is recommended to use close, example: D.features(D.instruments(‘csi500’), [‘Ref($close, -1)/$close-1’])

                                label
instrument  datetime
SH600004        2017-12-11  -0.013502
                2017-12-12  -0.072367
                2017-12-13  -0.068605
                2017-12-14  0.012440
                2017-12-15  -0.102778
Parameters:
  • show_notebook – True or False. If True, show graph in notebook, else return figures
  • start_date – start date
  • end_date – end date
Returns:

qlib.contrib.report.analysis_position.risk_analysis.risk_analysis_graph(analysis_df: pandas.core.frame.DataFrame = None, report_normal_df: pandas.core.frame.DataFrame = None, report_long_short_df: pandas.core.frame.DataFrame = None, show_notebook: bool = True) → Iterable[plotly.graph_objs._figure.Figure]

Generate analysis graph and monthly analysis

Example:

from qlib.contrib.evaluate import risk_analysis, backtest, long_short_backtest
from qlib.contrib.strategy import TopkDropoutStrategy
from qlib.contrib.report import analysis_position

# backtest parameters
bparas = {}
bparas['limit_threshold'] = 0.095
bparas['account'] = 1000000000

sparas = {}
sparas['topk'] = 50
sparas['n_drop'] = 230
strategy = TopkDropoutStrategy(**sparas)

report_normal_df, positions = backtest(pred_df, strategy, **bparas)
# long_short_map = long_short_backtest(pred_df)
# report_long_short_df = pd.DataFrame(long_short_map)

analysis = dict()
# analysis['pred_long'] = risk_analysis(report_long_short_df['long'])
# analysis['pred_short'] = risk_analysis(report_long_short_df['short'])
# analysis['pred_long_short'] = risk_analysis(report_long_short_df['long_short'])
analysis['excess_return_without_cost'] = risk_analysis(report_normal_df['return'] - report_normal_df['bench'])
analysis['excess_return_with_cost'] = risk_analysis(report_normal_df['return'] - report_normal_df['bench'] - report_normal_df['cost'])
analysis_df = pd.concat(analysis)

analysis_position.risk_analysis_graph(analysis_df, report_normal_df)
Parameters:
  • analysis_df

    analysis data, index is pd.MultiIndex; columns names is [risk].

                                                      risk
    excess_return_without_cost mean               0.000692
                               std                0.005374
                               annualized_return  0.174495
                               information_ratio  2.045576
                               max_drawdown      -0.079103
    excess_return_with_cost    mean               0.000499
                               std                0.005372
                               annualized_return  0.125625
                               information_ratio  1.473152
                               max_drawdown      -0.088263
    
  • report_normal_df

    df.index.name must be date, df.columns must contain return, turnover, cost, bench.

                return      cost        bench       turnover
    date
    2017-01-04  0.003421    0.000864    0.011693    0.576325
    2017-01-05  0.000508    0.000447    0.000721    0.227882
    2017-01-06  -0.003321   0.000212    -0.004322   0.102765
    2017-01-09  0.006753    0.000212    0.006874    0.105864
    2017-01-10  -0.000416   0.000440    -0.003350   0.208396
    
  • report_long_short_df

    df.index.name must be date, df.columns contain long, short, long_short.

                long        short       long_short
    date
    2017-01-04  -0.001360   0.001394    0.000034
    2017-01-05  0.002456    0.000058    0.002514
    2017-01-06  0.000120    0.002739    0.002859
    2017-01-09  0.001436    0.001838    0.003273
    2017-01-10  0.000824    -0.001944   -0.001120
    
  • show_notebook – Whether to display graphics in a notebook, default True. If True, show graph in notebook If False, return graph figure
Returns:

qlib.contrib.report.analysis_position.rank_label.rank_label_graph(position: dict, label_data: pandas.core.frame.DataFrame, start_date=None, end_date=None, show_notebook=True) → Iterable[plotly.graph_objs._figure.Figure]

Ranking percentage of stocks buy, sell, and holding on the trading day. Average rank-ratio(similar to sell_df[‘label’].rank(ascending=False) / len(sell_df)) of daily trading

Example:

from qlib.data import D
from qlib.contrib.evaluate import backtest
from qlib.contrib.strategy import TopkDropoutStrategy

# backtest parameters
bparas = {}
bparas['limit_threshold'] = 0.095
bparas['account'] = 1000000000

sparas = {}
sparas['topk'] = 50
sparas['n_drop'] = 230
strategy = TopkDropoutStrategy(**sparas)

_, positions = backtest(pred_df, strategy, **bparas)

pred_df_dates = pred_df.index.get_level_values(level='datetime')
features_df = D.features(D.instruments('csi500'), ['Ref($close, -1)/$close-1'], pred_df_dates.min(), pred_df_dates.max())
features_df.columns = ['label']

qcr.rank_label_graph(positions, features_df, pred_df_dates.min(), pred_df_dates.max())
Parameters:
  • position – position data; qlib.contrib.backtest.backtest.backtest result.
  • label_dataD.features result; index is pd.MultiIndex, index name is [instrument, datetime]; columns names is [label].

The label T is the change from T to T+1, it is recommended to use close, example: D.features(D.instruments(‘csi500’), [‘Ref($close, -1)/$close-1’]).

                                label
instrument  datetime
SH600004        2017-12-11  -0.013502
                2017-12-12  -0.072367
                2017-12-13  -0.068605
                2017-12-14  0.012440
                2017-12-15  -0.102778
Parameters:
  • start_date – start date
  • end_date – end_date
  • show_notebookTrue or False. If True, show graph in notebook, else return figures.
Returns:

qlib.contrib.report.analysis_model.analysis_model_performance.ic_figure(ic_df: pandas.core.frame.DataFrame, show_nature_day=True, **kwargs) → plotly.graph_objs._figure.Figure

IC figure

Parameters:
  • ic_df – ic DataFrame
  • show_nature_day – whether to display the abscissa of non-trading day
Returns:

plotly.graph_objs.Figure

qlib.contrib.report.analysis_model.analysis_model_performance.model_performance_graph(pred_label: pandas.core.frame.DataFrame, lag: int = 1, N: int = 5, reverse=False, rank=False, graph_names: list = ['group_return', 'pred_ic', 'pred_autocorr'], show_notebook: bool = True, show_nature_day=True) → [<class 'list'>, <class 'tuple'>]

Model performance

Parameters:pred_label – index is pd.MultiIndex, index name is [instrument, datetime]; columns names is **[score,

label]**. It is usually same as the label of model training(e.g. “Ref($close, -2)/Ref($close, -1) - 1”).

instrument  datetime        score       label
SH600004    2017-12-11  -0.013502       -0.013502
                2017-12-12  -0.072367       -0.072367
                2017-12-13  -0.068605       -0.068605
                2017-12-14  0.012440        0.012440
                2017-12-15  -0.102778       -0.102778
Parameters:
  • lagpred.groupby(level=’instrument’)[‘score’].shift(lag). It will be only used in the auto-correlation computing.
  • N – group number, default 5.
  • reverse – if True, pred[‘score’] *= -1.
  • rank – if True, calculate rank ic.
  • graph_names – graph names; default [‘cumulative_return’, ‘pred_ic’, ‘pred_autocorr’, ‘pred_turnover’].
  • show_notebook – whether to display graphics in notebook, the default is True.
  • show_nature_day – whether to display the abscissa of non-trading day.
Returns:

if show_notebook is True, display in notebook; else return plotly.graph_objs.Figure list.

Workflow

Experiment Manager

class qlib.workflow.expm.ExpManager(uri, default_exp_name)

This is the ExpManager class for managing experiments. The API is designed similar to mlflow. (The link: https://mlflow.org/docs/latest/python_api/mlflow.html)

__init__(uri, default_exp_name)

Initialize self. See help(type(self)) for accurate signature.

start_exp(experiment_name=None, recorder_name=None, uri=None, **kwargs)

Start an experiment. This method includes first get_or_create an experiment, and then set it to be active.

Parameters:
  • experiment_name (str) – name of the active experiment.
  • recorder_name (str) – name of the recorder to be started.
  • uri (str) – the current tracking URI.
Returns:

Return type:

An active experiment.

end_exp(recorder_status: str = 'SCHEDULED', **kwargs)

End an active experiment.

Parameters:
  • experiment_name (str) – name of the active experiment.
  • recorder_status (str) – the status of the active recorder of the experiment.
create_exp(experiment_name=None)

Create an experiment.

Parameters:experiment_name (str) – the experiment name, which must be unique.
Returns:
Return type:An experiment object.
search_records(experiment_ids=None, **kwargs)

Get a pandas DataFrame of records that fit the search criteria of the experiment. Inputs are the search critera user want to apply.

Returns:
  • A pandas.DataFrame of records, where each metric, parameter, and tag
  • are expanded into their own columns named metrics., params.*, and tags.**
  • respectively. For records that don’t have a particular metric, parameter, or tag, their
  • value will be (NumPy) Nan, None, or None respectively.
get_exp(experiment_id=None, experiment_name=None, create: bool = True)

Retrieve an experiment. This method includes getting an active experiment, and get_or_create a specific experiment. The returned experiment will be active.

When user specify experiment id and name, the method will try to return the specific experiment. When user does not provide recorder id or name, the method will try to return the current active experiment. The create argument determines whether the method will automatically create a new experiment according to user’s specification if the experiment hasn’t been created before.

  • If create is True:

    • If active experiment exists:

      • no id or name specified, return the active experiment.
      • if id or name is specified, return the specified experiment. If no such exp found, create a new experiment with given id or name, and the experiment is set to be active.
    • If active experiment not exists:

      • no id or name specified, create a default experiment.
      • if id or name is specified, return the specified experiment. If no such exp found, create a new experiment with given id or name, and the experiment is set to be active.
  • Else If create is False:

    • If active experiment exists:

      • no id or name specified, return the active experiment.
      • if id or name is specified, return the specified experiment. If no such exp found, raise Error.
    • If active experiment not exists:

      • no id or name specified. If the default experiment exists, return it, otherwise, raise Error.
      • if id or name is specified, return the specified experiment. If no such exp found, raise Error.
Parameters:
  • experiment_id (str) – id of the experiment to return.
  • experiment_name (str) – name of the experiment to return.
  • create (boolean) – create the experiment it if hasn’t been created before.
Returns:

Return type:

An experiment object.

delete_exp(experiment_id=None, experiment_name=None)

Delete an experiment.

Parameters:
  • experiment_id (str) – the experiment id.
  • experiment_name (str) – the experiment name.
get_uri()

Get the default tracking URI or current URI.

Returns:
Return type:The tracking URI string.
list_experiments()

List all the existing experiments.

Returns:
Return type:A dictionary (name -> experiment) of experiments information that being stored.

Experiment

class qlib.workflow.exp.Experiment(id, name)

This is the Experiment class for each experiment being run. The API is designed similar to mlflow. (The link: https://mlflow.org/docs/latest/python_api/mlflow.html)

__init__(id, name)

Initialize self. See help(type(self)) for accurate signature.

start(recorder_name=None)

Start the experiment and set it to be active. This method will also start a new recorder.

Parameters:recorder_name (str) – the name of the recorder to be created.
Returns:
Return type:An active recorder.
end(recorder_status='SCHEDULED')

End the experiment.

Parameters:recorder_status (str) – the status the recorder to be set with when ending (SCHEDULED, RUNNING, FINISHED, FAILED).
create_recorder(recorder_name=None)

Create a recorder for each experiment.

Parameters:recorder_name (str) – the name of the recorder to be created.
Returns:
Return type:A recorder object.
search_records(**kwargs)

Get a pandas DataFrame of records that fit the search criteria of the experiment. Inputs are the search critera user want to apply.

Returns:
  • A pandas.DataFrame of records, where each metric, parameter, and tag
  • are expanded into their own columns named metrics., params.*, and tags.**
  • respectively. For records that don’t have a particular metric, parameter, or tag, their
  • value will be (NumPy) Nan, None, or None respectively.
delete_recorder(recorder_id)

Create a recorder for each experiment.

Parameters:recorder_id (str) – the id of the recorder to be deleted.
get_recorder(recorder_id=None, recorder_name=None, create: bool = True)

Retrieve a Recorder for user. When user specify recorder id and name, the method will try to return the specific recorder. When user does not provide recorder id or name, the method will try to return the current active recorder. The create argument determines whether the method will automatically create a new recorder according to user’s specification if the recorder hasn’t been created before

  • If create is True:

    • If active recorder exists:

      • no id or name specified, return the active recorder.
      • if id or name is specified, return the specified recorder. If no such exp found, create a new recorder with given id or name, and the recorder shoud be active.
    • If active recorder not exists:

      • no id or name specified, create a new recorder.
      • if id or name is specified, return the specified experiment. If no such exp found, create a new recorder with given id or name, and the recorder shoud be active.
  • Else If create is False:

    • If active recorder exists:

      • no id or name specified, return the active recorder.
      • if id or name is specified, return the specified recorder. If no such exp found, raise Error.
    • If active recorder not exists:

      • no id or name specified, raise Error.
      • if id or name is specified, return the specified recorder. If no such exp found, raise Error.
Parameters:
  • recorder_id (str) – the id of the recorder to be deleted.
  • recorder_name (str) – the name of the recorder to be deleted.
  • create (boolean) – create the recorder if it hasn’t been created before.
Returns:

Return type:

A recorder object.

list_recorders()

List all the existing recorders of this experiment. Please first get the experiment instance before calling this method. If user want to use the method R.list_recorders(), please refer to the related API document in QlibRecorder.

Returns:
Return type:A dictionary (id -> recorder) of recorder information that being stored.

Recorder

class qlib.workflow.recorder.Recorder(experiment_id, name)

This is the Recorder class for logging the experiments. The API is designed similar to mlflow. (The link: https://mlflow.org/docs/latest/python_api/mlflow.html)

The status of the recorder can be SCHEDULED, RUNNING, FINISHED, FAILED.

__init__(experiment_id, name)

Initialize self. See help(type(self)) for accurate signature.

save_objects(local_path=None, artifact_path=None, **kwargs)

Save objects such as prediction file or model checkpoints to the artifact URI. User can save object through keywords arguments (name:value).

Parameters:
  • local_path (str) – if provided, them save the file or directory to the artifact URI.
  • artifact_path=None (str) – the relative path for the artifact to be stored in the URI.
load_object(name)

Load objects such as prediction file or model checkpoints.

Parameters:name (str) – name of the file to be loaded.
Returns:
Return type:The saved object.
start_run()

Start running or resuming the Recorder. The return value can be used as a context manager within a with block; otherwise, you must call end_run() to terminate the current run. (See ActiveRun class in mlflow)

Returns:
Return type:An active running object (e.g. mlflow.ActiveRun object)
end_run()

End an active Recorder.

log_params(**kwargs)

Log a batch of params for the current run.

Parameters:arguments (keyword) – key, value pair to be logged as parameters.
log_metrics(step=None, **kwargs)

Log multiple metrics for the current run.

Parameters:arguments (keyword) – key, value pair to be logged as metrics.
set_tags(**kwargs)

Log a batch of tags for the current run.

Parameters:arguments (keyword) – key, value pair to be logged as tags.
delete_tags(*keys)

Delete some tags from a run.

Parameters:keys (series of strs of the keys) – all the name of the tag to be deleted.
list_artifacts(artifact_path: str = None)

List all the artifacts of a recorder.

Parameters:artifact_path (str) – the relative path for the artifact to be stored in the URI.
Returns:
Return type:A list of artifacts information (name, path, etc.) that being stored.
list_metrics()

List all the metrics of a recorder.

Returns:
Return type:A dictionary of metrics that being stored.
list_params()

List all the params of a recorder.

Returns:
Return type:A dictionary of params that being stored.
list_tags()

List all the tags of a recorder.

Returns:
Return type:A dictionary of tags that being stored.

Record Template

class qlib.workflow.record_temp.RecordTemp(recorder)

This is the Records Template class that enables user to generate experiment results such as IC and backtest in a certain format.

__init__(recorder)

Initialize self. See help(type(self)) for accurate signature.

generate(**kwargs)

Generate certain records such as IC, backtest etc., and save them.

Parameters:kwargs
load(name)

Load the stored records. Due to the fact that some problems occured when we tried to balancing a clean API with the Python’s inheritance. This method has to be used in a rather ugly way, and we will try to fix them in the future:

sar = SigAnaRecord(recorder)
ic = sar.load(sar.get_path("ic.pkl"))
Parameters:name (str) – the name for the file to be load.
Returns:
Return type:The stored records.
list()

List the stored records.

Returns:
Return type:A list of all the stored records.
check(parent=False)

Check if the records is properly generated and saved.

FileExistsError: whether the records are stored properly.

class qlib.workflow.record_temp.SignalRecord(model=None, dataset=None, recorder=None, **kwargs)

This is the Signal Record class that generates the signal prediction. This class inherits the RecordTemp class.

__init__(model=None, dataset=None, recorder=None, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

generate(**kwargs)

Generate certain records such as IC, backtest etc., and save them.

Parameters:kwargs
list()

List the stored records.

Returns:
Return type:A list of all the stored records.
load(name='pred.pkl')

Load the stored records. Due to the fact that some problems occured when we tried to balancing a clean API with the Python’s inheritance. This method has to be used in a rather ugly way, and we will try to fix them in the future:

sar = SigAnaRecord(recorder)
ic = sar.load(sar.get_path("ic.pkl"))
Parameters:name (str) – the name for the file to be load.
Returns:
Return type:The stored records.
class qlib.workflow.record_temp.SigAnaRecord(recorder, ana_long_short=False, ann_scaler=252, **kwargs)

This is the Signal Analysis Record class that generates the analysis results such as IC and IR. This class inherits the RecordTemp class.

__init__(recorder, ana_long_short=False, ann_scaler=252, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

generate()

Generate certain records such as IC, backtest etc., and save them.

Parameters:kwargs
list()

List the stored records.

Returns:
Return type:A list of all the stored records.
class qlib.workflow.record_temp.PortAnaRecord(recorder, config, **kwargs)

This is the Portfolio Analysis Record class that generates the analysis results such as those of backtest. This class inherits the RecordTemp class.

The following files will be stored in recorder - report_normal.pkl & positions_normal.pkl:

  • The return report and detailed positions of the backtest, returned by qlib/contrib/evaluate.py:backtest
  • port_analysis.pkl : The risk analysis of your portfolio, returned by qlib/contrib/evaluate.py:risk_analysis
__init__(recorder, config, **kwargs)
config[“strategy”] : dict
define the strategy class as well as the kwargs.
config[“backtest”] : dict
define the backtest kwargs.
generate(**kwargs)

Generate certain records such as IC, backtest etc., and save them.

Parameters:kwargs
list()

List the stored records.

Returns:
Return type:A list of all the stored records.