API Reference¶
Here you can find all Qlib
interfaces.
Data¶
Provider¶
-
class
qlib.data.data.
CalendarProvider
(*args, **kwargs)¶ Calendar provider base class
Provide calendar data.
-
__init__
(*args, **kwargs)¶ Initialize self. See help(type(self)) for accurate signature.
-
calendar
(start_time=None, end_time=None, freq='day', future=False)¶ Get calendar of certain market in given time range.
Parameters: - start_time (str) – start of the time range.
- end_time (str) – end of the time range.
- freq (str) – time frequency, available: year/quarter/month/week/day.
- future (bool) – whether including future trading day.
Returns: calendar list
Return type: list
-
locate_index
(start_time, end_time, freq, future)¶ Locate the start time index and end time index in a calendar under certain frequency.
Parameters: - start_time (str) – start of the time range.
- end_time (str) – end of the time range.
- freq (str) – time frequency, available: year/quarter/month/week/day.
- future (bool) – whether including future trading day.
Returns: - pd.Timestamp – the real start time.
- pd.Timestamp – the real end time.
- int – the index of start time.
- int – the index of end time.
-
-
class
qlib.data.data.
InstrumentProvider
(*args, **kwargs)¶ Instrument provider base class
Provide instrument data.
-
__init__
(*args, **kwargs)¶ Initialize self. See help(type(self)) for accurate signature.
-
static
instruments
(market='all', filter_pipe=None)¶ Get the general config dictionary for a base market adding several dynamic filters.
Parameters: - market (str) – market/industry/index shortname, e.g. all/sse/szse/sse50/csi300/csi500.
- filter_pipe (list) – the list of dynamic filters.
Returns: dict of stockpool config. {`market`=>base market name, `filter_pipe`=>list of filters}
example :
Return type: dict
-
list_instruments
(instruments, start_time=None, end_time=None, freq='day', as_list=False)¶ List the instruments based on a certain stockpool config.
Parameters: - instruments (dict) – stockpool config.
- start_time (str) – start of the time range.
- end_time (str) – end of the time range.
- as_list (bool) – return instruments as list or dict.
Returns: instruments list or dictionary with time spans
Return type: dict or list
-
-
class
qlib.data.data.
FeatureProvider
(*args, **kwargs)¶ Feature provider class
Provide feature data.
-
__init__
(*args, **kwargs)¶ Initialize self. See help(type(self)) for accurate signature.
-
feature
(instrument, field, start_time, end_time, freq)¶ Get feature data.
Parameters: - instrument (str) – a certain instrument.
- field (str) – a certain field of feature.
- start_time (str) – start of the time range.
- end_time (str) – end of the time range.
- freq (str) – time frequency, available: year/quarter/month/week/day.
Returns: data of a certain feature
Return type: pd.Series
-
-
class
qlib.data.data.
ExpressionProvider
¶ Expression provider class
Provide Expression data.
-
__init__
()¶ Initialize self. See help(type(self)) for accurate signature.
-
expression
(instrument, field, start_time=None, end_time=None, freq='day')¶ Get Expression data.
Parameters: - instrument (str) – a certain instrument.
- field (str) – a certain field of feature.
- start_time (str) – start of the time range.
- end_time (str) – end of the time range.
- freq (str) – time frequency, available: year/quarter/month/week/day.
Returns: data of a certain expression
Return type: pd.Series
-
-
class
qlib.data.data.
DatasetProvider
¶ Dataset provider class
Provide Dataset data.
-
dataset
(instruments, fields, start_time=None, end_time=None, freq='day', inst_processors=[])¶ Get dataset data.
Parameters: - instruments (list or dict) – list/dict of instruments or dict of stockpool config.
- fields (list) – list of feature instances.
- start_time (str) – start of the time range.
- end_time (str) – end of the time range.
- freq (str) – time frequency.
- inst_processors (Iterable[Union[dict, InstProcessor]]) – the operations performed on each instrument
Returns: a pandas dataframe with <instrument, datetime> index.
Return type: pd.DataFrame
-
static
get_instruments_d
(instruments, freq)¶ Parse different types of input instruments to output instruments_d Wrong format of input instruments will lead to exception.
-
static
get_column_names
(fields)¶ Get column names from input fields
-
static
dataset_processor
(instruments_d, column_names, start_time, end_time, freq, inst_processors=[])¶ Load and process the data, return the data set. - default using multi-kernel method.
-
static
expression_calculator
(inst, start_time, end_time, freq, column_names, spans=None, g_config=None, inst_processors=[])¶ Calculate the expressions for one instrument, return a df result. If the expression has been calculated before, load from cache.
return value: A data frame with index ‘datetime’ and other data columns.
-
-
class
qlib.data.data.
LocalCalendarProvider
(**kwargs)¶ Local calendar data provider class
Provide calendar data from local data source.
-
__init__
(**kwargs)¶ Initialize self. See help(type(self)) for accurate signature.
-
load_calendar
(freq, future)¶ Load original calendar timestamp from file.
Parameters: freq (str) – frequency of read calendar file. Returns: list of timestamps Return type: list
-
calendar
(start_time=None, end_time=None, freq='day', future=False)¶ Get calendar of certain market in given time range.
Parameters: - start_time (str) – start of the time range.
- end_time (str) – end of the time range.
- freq (str) – time frequency, available: year/quarter/month/week/day.
- future (bool) – whether including future trading day.
Returns: calendar list
Return type: list
-
-
class
qlib.data.data.
LocalInstrumentProvider
(*args, **kwargs)¶ Local instrument data provider class
Provide instrument data from local data source.
-
list_instruments
(instruments, start_time=None, end_time=None, freq='day', as_list=False)¶ List the instruments based on a certain stockpool config.
Parameters: - instruments (dict) – stockpool config.
- start_time (str) – start of the time range.
- end_time (str) – end of the time range.
- as_list (bool) – return instruments as list or dict.
Returns: instruments list or dictionary with time spans
Return type: dict or list
-
-
class
qlib.data.data.
LocalFeatureProvider
(**kwargs)¶ Local feature data provider class
Provide feature data from local data source.
-
__init__
(**kwargs)¶ Initialize self. See help(type(self)) for accurate signature.
-
feature
(instrument, field, start_index, end_index, freq)¶ Get feature data.
Parameters: - instrument (str) – a certain instrument.
- field (str) – a certain field of feature.
- start_time (str) – start of the time range.
- end_time (str) – end of the time range.
- freq (str) – time frequency, available: year/quarter/month/week/day.
Returns: data of a certain feature
Return type: pd.Series
-
-
class
qlib.data.data.
LocalExpressionProvider
¶ Local expression data provider class
Provide expression data from local data source.
-
expression
(instrument, field, start_time=None, end_time=None, freq='day')¶ Get Expression data.
Parameters: - instrument (str) – a certain instrument.
- field (str) – a certain field of feature.
- start_time (str) – start of the time range.
- end_time (str) – end of the time range.
- freq (str) – time frequency, available: year/quarter/month/week/day.
Returns: data of a certain expression
Return type: pd.Series
-
-
class
qlib.data.data.
LocalDatasetProvider
¶ Local dataset data provider class
Provide dataset data from local data source.
-
__init__
()¶ Initialize self. See help(type(self)) for accurate signature.
-
dataset
(instruments, fields, start_time=None, end_time=None, freq='day', inst_processors=[])¶ Get dataset data.
Parameters: - instruments (list or dict) – list/dict of instruments or dict of stockpool config.
- fields (list) – list of feature instances.
- start_time (str) – start of the time range.
- end_time (str) – end of the time range.
- freq (str) – time frequency.
- inst_processors (Iterable[Union[dict, InstProcessor]]) – the operations performed on each instrument
Returns: a pandas dataframe with <instrument, datetime> index.
Return type: pd.DataFrame
-
static
multi_cache_walker
(instruments, fields, start_time=None, end_time=None, freq='day')¶ This method is used to prepare the expression cache for the client. Then the client will load the data from expression cache by itself.
-
static
cache_walker
(inst, start_time, end_time, freq, column_names)¶ If the expressions of one instrument haven’t been calculated before, calculate it and write it into expression cache.
-
-
class
qlib.data.data.
ClientCalendarProvider
¶ Client calendar data provider class
Provide calendar data by requesting data from server as a client.
-
__init__
()¶ Initialize self. See help(type(self)) for accurate signature.
-
calendar
(start_time=None, end_time=None, freq='day', future=False)¶ Get calendar of certain market in given time range.
Parameters: - start_time (str) – start of the time range.
- end_time (str) – end of the time range.
- freq (str) – time frequency, available: year/quarter/month/week/day.
- future (bool) – whether including future trading day.
Returns: calendar list
Return type: list
-
-
class
qlib.data.data.
ClientInstrumentProvider
¶ Client instrument data provider class
Provide instrument data by requesting data from server as a client.
-
__init__
()¶ Initialize self. See help(type(self)) for accurate signature.
-
list_instruments
(instruments, start_time=None, end_time=None, freq='day', as_list=False)¶ List the instruments based on a certain stockpool config.
Parameters: - instruments (dict) – stockpool config.
- start_time (str) – start of the time range.
- end_time (str) – end of the time range.
- as_list (bool) – return instruments as list or dict.
Returns: instruments list or dictionary with time spans
Return type: dict or list
-
-
class
qlib.data.data.
ClientDatasetProvider
¶ Client dataset data provider class
Provide dataset data by requesting data from server as a client.
-
__init__
()¶ Initialize self. See help(type(self)) for accurate signature.
-
dataset
(instruments, fields, start_time=None, end_time=None, freq='day', disk_cache=0, return_uri=False, inst_processors=[])¶ Get dataset data.
Parameters: - instruments (list or dict) – list/dict of instruments or dict of stockpool config.
- fields (list) – list of feature instances.
- start_time (str) – start of the time range.
- end_time (str) – end of the time range.
- freq (str) – time frequency.
- inst_processors (Iterable[Union[dict, InstProcessor]]) – the operations performed on each instrument
Returns: a pandas dataframe with <instrument, datetime> index.
Return type: pd.DataFrame
-
-
class
qlib.data.data.
BaseProvider
¶ Local provider class
To keep compatible with old qlib provider.
-
features
(instruments, fields, start_time=None, end_time=None, freq='day', disk_cache=None, inst_processors=[])¶ - disk_cache : int
- whether to skip(0)/use(1)/replace(2) disk_cache
This function will try to use cache method which has a keyword disk_cache, and will use provider method if a type error is raised because the DatasetD instance is a provider class.
-
-
class
qlib.data.data.
LocalProvider
¶ -
features_uri
(instruments, fields, start_time, end_time, freq, disk_cache=1)¶ Return the uri of the generated cache of features/dataset
Parameters: - disk_cache –
- instruments –
- fields –
- start_time –
- end_time –
- freq –
-
-
class
qlib.data.data.
ClientProvider
¶ Client Provider
- Requesting data from server as a client. Can propose requests:
- Calendar : Directly respond a list of calendars
- Instruments (without filter): Directly respond a list/dict of instruments
- Instruments (with filters): Respond a list/dict of instruments
- Features : Respond a cache uri
The general workflow is described as follows: When the user use client provider to propose a request, the client provider will connect the server and send the request. The client will start to wait for the response. The response will be made instantly indicating whether the cache is available. The waiting procedure will terminate only when the client get the reponse saying feature_available is true. BUG : Everytime we make request for certain data we need to connect to the server, wait for the response and disconnect from it. We can’t make a sequence of requests within one connection. You can refer to https://python-socketio.readthedocs.io/en/latest/client.html for documentation of python-socketIO client.
-
__init__
()¶ Initialize self. See help(type(self)) for accurate signature.
-
qlib.data.data.
CalendarProviderWrapper
¶ alias of
qlib.data.data.CalendarProvider
-
qlib.data.data.
InstrumentProviderWrapper
¶ alias of
qlib.data.data.InstrumentProvider
-
qlib.data.data.
FeatureProviderWrapper
¶ alias of
qlib.data.data.FeatureProvider
-
qlib.data.data.
ExpressionProviderWrapper
¶ alias of
qlib.data.data.ExpressionProvider
-
qlib.data.data.
DatasetProviderWrapper
¶ alias of
qlib.data.data.DatasetProvider
-
qlib.data.data.
BaseProviderWrapper
¶ alias of
qlib.data.data.BaseProvider
-
qlib.data.data.
register_all_wrappers
(C)¶
Filter¶
-
class
qlib.data.filter.
BaseDFilter
¶ Dynamic Instruments Filter Abstract class
Users can override this class to construct their own filter
Override __init__ to input filter regulations
Override filter_main to use the regulations to filter instruments
-
__init__
()¶ Initialize self. See help(type(self)) for accurate signature.
-
static
from_config
(config)¶ Construct an instance from config dict.
Parameters: config (dict) – dict of config parameters.
-
to_config
()¶ Construct an instance from config dict.
Returns: return the dict of config parameters. Return type: dict
-
-
class
qlib.data.filter.
SeriesDFilter
(fstart_time=None, fend_time=None)¶ Dynamic Instruments Filter Abstract class to filter a series of certain features
Filters should provide parameters:
- filter start time
- filter end time
- filter rule
Override __init__ to assign a certain rule to filter the series.
Override _getFilterSeries to use the rule to filter the series and get a dict of {inst => series}, or override filter_main for more advanced series filter rule
-
__init__
(fstart_time=None, fend_time=None)¶ - Init function for filter base class.
- Filter a set of instruments based on a certain rule within a certain period assigned by fstart_time and fend_time.
Parameters: - fstart_time (str) – the time for the filter rule to start filter the instruments.
- fend_time (str) – the time for the filter rule to stop filter the instruments.
-
filter_main
(instruments, start_time=None, end_time=None)¶ Implement this method to filter the instruments.
Parameters: - instruments (dict) – input instruments to be filtered.
- start_time (str) – start of the time range.
- end_time (str) – end of the time range.
Returns: filtered instruments, same structure as input instruments.
Return type: dict
-
class
qlib.data.filter.
NameDFilter
(name_rule_re, fstart_time=None, fend_time=None)¶ Name dynamic instrument filter
Filter the instruments based on a regulated name format.
A name rule regular expression is required.
-
__init__
(name_rule_re, fstart_time=None, fend_time=None)¶ Init function for name filter class
- name_rule_re: str
- regular expression for the name rule.
-
static
from_config
(config)¶ Construct an instance from config dict.
Parameters: config (dict) – dict of config parameters.
-
to_config
()¶ Construct an instance from config dict.
Returns: return the dict of config parameters. Return type: dict
-
-
class
qlib.data.filter.
ExpressionDFilter
(rule_expression, fstart_time=None, fend_time=None, keep=False)¶ Expression dynamic instrument filter
Filter the instruments based on a certain expression.
An expression rule indicating a certain feature field is required.
Examples
- basic features filter : rule_expression = ‘$close/$open>5’
- cross-sectional features filter : rule_expression = ‘$rank($close)<10’
- time-sequence features filter : rule_expression = ‘$Ref($close, 3)>100’
-
__init__
(rule_expression, fstart_time=None, fend_time=None, keep=False)¶ Init function for expression filter class
- fstart_time: str
- filter the feature starting from this time.
- fend_time: str
- filter the feature ending by this time.
- rule_expression: str
- an input expression for the rule.
- keep: bool
- whether to keep the instruments of which features don’t exist in the filter time span.
-
static
from_config
(config)¶ Construct an instance from config dict.
Parameters: config (dict) – dict of config parameters.
-
to_config
()¶ Construct an instance from config dict.
Returns: return the dict of config parameters. Return type: dict
Class¶
-
class
qlib.data.base.
Expression
¶ Expression base class
-
load
(instrument, start_index, end_index, freq)¶ load feature
Parameters: - instrument (str) – instrument code.
- start_index (str) – feature start index [in calendar].
- end_index (str) – feature end index [in calendar].
- freq (str) – feature frequency.
Returns: feature series: The index of the series is the calendar index
Return type: pd.Series
-
get_longest_back_rolling
()¶ Get the longest length of historical data the feature has accessed
This is designed for getting the needed range of the data to calculate the features in specific range at first. However, situations like Ref(Ref($close, -1), 1) can not be handled rightly.
So this will only used for detecting the length of historical data needed.
-
get_extended_window_size
()¶ get_extend_window_size
For to calculate this Operator in range[start_index, end_index] We have to get the leaf feature in range[start_index - lft_etd, end_index + rght_etd].
Returns: lft_etd, rght_etd Return type: (int, int)
-
-
class
qlib.data.base.
Feature
(name=None)¶ Static Expression
This kind of feature will load data from provider
-
__init__
(name=None)¶ Initialize self. See help(type(self)) for accurate signature.
-
get_longest_back_rolling
()¶ Get the longest length of historical data the feature has accessed
This is designed for getting the needed range of the data to calculate the features in specific range at first. However, situations like Ref(Ref($close, -1), 1) can not be handled rightly.
So this will only used for detecting the length of historical data needed.
-
get_extended_window_size
()¶ get_extend_window_size
For to calculate this Operator in range[start_index, end_index] We have to get the leaf feature in range[start_index - lft_etd, end_index + rght_etd].
Returns: lft_etd, rght_etd Return type: (int, int)
-
-
class
qlib.data.base.
ExpressionOps
¶ Operator Expression
This kind of feature will use operator for feature construction on the fly.
Operator¶
-
class
qlib.data.ops.
ElemOperator
(feature)¶ Element-wise Operator
Parameters: feature (Expression) – feature instance Returns: feature operation output Return type: Expression -
__init__
(feature)¶ Initialize self. See help(type(self)) for accurate signature.
-
get_longest_back_rolling
()¶ Get the longest length of historical data the feature has accessed
This is designed for getting the needed range of the data to calculate the features in specific range at first. However, situations like Ref(Ref($close, -1), 1) can not be handled rightly.
So this will only used for detecting the length of historical data needed.
-
get_extended_window_size
()¶ get_extend_window_size
For to calculate this Operator in range[start_index, end_index] We have to get the leaf feature in range[start_index - lft_etd, end_index + rght_etd].
Returns: lft_etd, rght_etd Return type: (int, int)
-
-
class
qlib.data.ops.
NpElemOperator
(feature, func)¶ Numpy Element-wise Operator
Parameters: - feature (Expression) – feature instance
- func (str) – numpy feature operation method
Returns: feature operation output
Return type: -
__init__
(feature, func)¶ Initialize self. See help(type(self)) for accurate signature.
-
class
qlib.data.ops.
Abs
(feature)¶ Feature Absolute Value
Parameters: feature (Expression) – feature instance Returns: a feature instance with absolute output Return type: Expression -
__init__
(feature)¶ Initialize self. See help(type(self)) for accurate signature.
-
-
class
qlib.data.ops.
Sign
(feature)¶ Feature Sign
Parameters: feature (Expression) – feature instance Returns: a feature instance with sign Return type: Expression -
__init__
(feature)¶ Initialize self. See help(type(self)) for accurate signature.
-
-
class
qlib.data.ops.
Log
(feature)¶ Feature Log
Parameters: feature (Expression) – feature instance Returns: a feature instance with log Return type: Expression -
__init__
(feature)¶ Initialize self. See help(type(self)) for accurate signature.
-
-
class
qlib.data.ops.
Power
(feature, exponent)¶ Feature Power
Parameters: feature (Expression) – feature instance Returns: a feature instance with power Return type: Expression -
__init__
(feature, exponent)¶ Initialize self. See help(type(self)) for accurate signature.
-
-
class
qlib.data.ops.
Mask
(feature, instrument)¶ Feature Mask
Parameters: - feature (Expression) – feature instance
- instrument (str) – instrument mask
Returns: a feature instance with masked instrument
Return type: -
__init__
(feature, instrument)¶ Initialize self. See help(type(self)) for accurate signature.
-
class
qlib.data.ops.
Not
(feature)¶ Not Operator
Parameters: - feature_left (Expression) – feature instance
- feature_right (Expression) – feature instance
Returns: feature elementwise not output
Return type: -
__init__
(feature)¶ Initialize self. See help(type(self)) for accurate signature.
-
class
qlib.data.ops.
PairOperator
(feature_left, feature_right)¶ Pair-wise operator
Parameters: - feature_left (Expression) – feature instance or numeric value
- feature_right (Expression) – feature instance or numeric value
- func (str) – operator function
Returns: two features’ operation output
Return type: -
__init__
(feature_left, feature_right)¶ Initialize self. See help(type(self)) for accurate signature.
-
get_longest_back_rolling
()¶ Get the longest length of historical data the feature has accessed
This is designed for getting the needed range of the data to calculate the features in specific range at first. However, situations like Ref(Ref($close, -1), 1) can not be handled rightly.
So this will only used for detecting the length of historical data needed.
-
get_extended_window_size
()¶ get_extend_window_size
For to calculate this Operator in range[start_index, end_index] We have to get the leaf feature in range[start_index - lft_etd, end_index + rght_etd].
Returns: lft_etd, rght_etd Return type: (int, int)
-
class
qlib.data.ops.
NpPairOperator
(feature_left, feature_right, func)¶ Numpy Pair-wise operator
Parameters: - feature_left (Expression) – feature instance or numeric value
- feature_right (Expression) – feature instance or numeric value
- func (str) – operator function
Returns: two features’ operation output
Return type: -
__init__
(feature_left, feature_right, func)¶ Initialize self. See help(type(self)) for accurate signature.
-
class
qlib.data.ops.
Add
(feature_left, feature_right)¶ Add Operator
Parameters: - feature_left (Expression) – feature instance
- feature_right (Expression) – feature instance
Returns: two features’ sum
Return type: -
__init__
(feature_left, feature_right)¶ Initialize self. See help(type(self)) for accurate signature.
-
class
qlib.data.ops.
Sub
(feature_left, feature_right)¶ Subtract Operator
Parameters: - feature_left (Expression) – feature instance
- feature_right (Expression) – feature instance
Returns: two features’ subtraction
Return type: -
__init__
(feature_left, feature_right)¶ Initialize self. See help(type(self)) for accurate signature.
-
class
qlib.data.ops.
Mul
(feature_left, feature_right)¶ Multiply Operator
Parameters: - feature_left (Expression) – feature instance
- feature_right (Expression) – feature instance
Returns: two features’ product
Return type: -
__init__
(feature_left, feature_right)¶ Initialize self. See help(type(self)) for accurate signature.
-
class
qlib.data.ops.
Div
(feature_left, feature_right)¶ Division Operator
Parameters: - feature_left (Expression) – feature instance
- feature_right (Expression) – feature instance
Returns: two features’ division
Return type: -
__init__
(feature_left, feature_right)¶ Initialize self. See help(type(self)) for accurate signature.
-
class
qlib.data.ops.
Greater
(feature_left, feature_right)¶ Greater Operator
Parameters: - feature_left (Expression) – feature instance
- feature_right (Expression) – feature instance
Returns: greater elements taken from the input two features
Return type: -
__init__
(feature_left, feature_right)¶ Initialize self. See help(type(self)) for accurate signature.
-
class
qlib.data.ops.
Less
(feature_left, feature_right)¶ Less Operator
Parameters: - feature_left (Expression) – feature instance
- feature_right (Expression) – feature instance
Returns: smaller elements taken from the input two features
Return type: -
__init__
(feature_left, feature_right)¶ Initialize self. See help(type(self)) for accurate signature.
-
class
qlib.data.ops.
Gt
(feature_left, feature_right)¶ Greater Than Operator
Parameters: - feature_left (Expression) – feature instance
- feature_right (Expression) – feature instance
Returns: bool series indicate left > right
Return type: -
__init__
(feature_left, feature_right)¶ Initialize self. See help(type(self)) for accurate signature.
-
class
qlib.data.ops.
Ge
(feature_left, feature_right)¶ Greater Equal Than Operator
Parameters: - feature_left (Expression) – feature instance
- feature_right (Expression) – feature instance
Returns: bool series indicate left >= right
Return type: -
__init__
(feature_left, feature_right)¶ Initialize self. See help(type(self)) for accurate signature.
-
class
qlib.data.ops.
Lt
(feature_left, feature_right)¶ Less Than Operator
Parameters: - feature_left (Expression) – feature instance
- feature_right (Expression) – feature instance
Returns: bool series indicate left < right
Return type: -
__init__
(feature_left, feature_right)¶ Initialize self. See help(type(self)) for accurate signature.
-
class
qlib.data.ops.
Le
(feature_left, feature_right)¶ Less Equal Than Operator
Parameters: - feature_left (Expression) – feature instance
- feature_right (Expression) – feature instance
Returns: bool series indicate left <= right
Return type: -
__init__
(feature_left, feature_right)¶ Initialize self. See help(type(self)) for accurate signature.
-
class
qlib.data.ops.
Eq
(feature_left, feature_right)¶ Equal Operator
Parameters: - feature_left (Expression) – feature instance
- feature_right (Expression) – feature instance
Returns: bool series indicate left == right
Return type: -
__init__
(feature_left, feature_right)¶ Initialize self. See help(type(self)) for accurate signature.
-
class
qlib.data.ops.
Ne
(feature_left, feature_right)¶ Not Equal Operator
Parameters: - feature_left (Expression) – feature instance
- feature_right (Expression) – feature instance
Returns: bool series indicate left != right
Return type: -
__init__
(feature_left, feature_right)¶ Initialize self. See help(type(self)) for accurate signature.
-
class
qlib.data.ops.
And
(feature_left, feature_right)¶ And Operator
Parameters: - feature_left (Expression) – feature instance
- feature_right (Expression) – feature instance
Returns: two features’ row by row & output
Return type: -
__init__
(feature_left, feature_right)¶ Initialize self. See help(type(self)) for accurate signature.
-
class
qlib.data.ops.
Or
(feature_left, feature_right)¶ Or Operator
Parameters: - feature_left (Expression) – feature instance
- feature_right (Expression) – feature instance
Returns: two features’ row by row | outputs
Return type: -
__init__
(feature_left, feature_right)¶ Initialize self. See help(type(self)) for accurate signature.
-
class
qlib.data.ops.
If
(condition, feature_left, feature_right)¶ If Operator
Parameters: - condition (Expression) – feature instance with bool values as condition
- feature_left (Expression) – feature instance
- feature_right (Expression) – feature instance
-
__init__
(condition, feature_left, feature_right)¶ Initialize self. See help(type(self)) for accurate signature.
-
get_longest_back_rolling
()¶ Get the longest length of historical data the feature has accessed
This is designed for getting the needed range of the data to calculate the features in specific range at first. However, situations like Ref(Ref($close, -1), 1) can not be handled rightly.
So this will only used for detecting the length of historical data needed.
-
get_extended_window_size
()¶ get_extend_window_size
For to calculate this Operator in range[start_index, end_index] We have to get the leaf feature in range[start_index - lft_etd, end_index + rght_etd].
Returns: lft_etd, rght_etd Return type: (int, int)
-
class
qlib.data.ops.
Rolling
(feature, N, func)¶ Rolling Operator
Parameters: - feature (Expression) – feature instance
- N (int) – rolling window size
- func (str) – rolling method
Returns: rolling outputs
Return type: -
__init__
(feature, N, func)¶ Initialize self. See help(type(self)) for accurate signature.
-
get_longest_back_rolling
()¶ Get the longest length of historical data the feature has accessed
This is designed for getting the needed range of the data to calculate the features in specific range at first. However, situations like Ref(Ref($close, -1), 1) can not be handled rightly.
So this will only used for detecting the length of historical data needed.
-
get_extended_window_size
()¶ get_extend_window_size
For to calculate this Operator in range[start_index, end_index] We have to get the leaf feature in range[start_index - lft_etd, end_index + rght_etd].
Returns: lft_etd, rght_etd Return type: (int, int)
-
class
qlib.data.ops.
Ref
(feature, N)¶ Feature Reference
Parameters: - feature (Expression) – feature instance
- N (int) – N = 0, retrieve the first data; N > 0, retrieve data of N periods ago; N < 0, future data
Returns: a feature instance with target reference
Return type: -
__init__
(feature, N)¶ Initialize self. See help(type(self)) for accurate signature.
-
get_longest_back_rolling
()¶ Get the longest length of historical data the feature has accessed
This is designed for getting the needed range of the data to calculate the features in specific range at first. However, situations like Ref(Ref($close, -1), 1) can not be handled rightly.
So this will only used for detecting the length of historical data needed.
-
get_extended_window_size
()¶ get_extend_window_size
For to calculate this Operator in range[start_index, end_index] We have to get the leaf feature in range[start_index - lft_etd, end_index + rght_etd].
Returns: lft_etd, rght_etd Return type: (int, int)
-
class
qlib.data.ops.
Mean
(feature, N)¶ Rolling Mean (MA)
Parameters: - feature (Expression) – feature instance
- N (int) – rolling window size
Returns: a feature instance with rolling average
Return type: -
__init__
(feature, N)¶ Initialize self. See help(type(self)) for accurate signature.
-
class
qlib.data.ops.
Sum
(feature, N)¶ Rolling Sum
Parameters: - feature (Expression) – feature instance
- N (int) – rolling window size
Returns: a feature instance with rolling sum
Return type: -
__init__
(feature, N)¶ Initialize self. See help(type(self)) for accurate signature.
-
class
qlib.data.ops.
Std
(feature, N)¶ Rolling Std
Parameters: - feature (Expression) – feature instance
- N (int) – rolling window size
Returns: a feature instance with rolling std
Return type: -
__init__
(feature, N)¶ Initialize self. See help(type(self)) for accurate signature.
-
class
qlib.data.ops.
Var
(feature, N)¶ Rolling Variance
Parameters: - feature (Expression) – feature instance
- N (int) – rolling window size
Returns: a feature instance with rolling variance
Return type: -
__init__
(feature, N)¶ Initialize self. See help(type(self)) for accurate signature.
-
class
qlib.data.ops.
Skew
(feature, N)¶ Rolling Skewness
Parameters: - feature (Expression) – feature instance
- N (int) – rolling window size
Returns: a feature instance with rolling skewness
Return type: -
__init__
(feature, N)¶ Initialize self. See help(type(self)) for accurate signature.
-
class
qlib.data.ops.
Kurt
(feature, N)¶ Rolling Kurtosis
Parameters: - feature (Expression) – feature instance
- N (int) – rolling window size
Returns: a feature instance with rolling kurtosis
Return type: -
__init__
(feature, N)¶ Initialize self. See help(type(self)) for accurate signature.
-
class
qlib.data.ops.
Max
(feature, N)¶ Rolling Max
Parameters: - feature (Expression) – feature instance
- N (int) – rolling window size
Returns: a feature instance with rolling max
Return type: -
__init__
(feature, N)¶ Initialize self. See help(type(self)) for accurate signature.
-
class
qlib.data.ops.
IdxMax
(feature, N)¶ Rolling Max Index
Parameters: - feature (Expression) – feature instance
- N (int) – rolling window size
Returns: a feature instance with rolling max index
Return type: -
__init__
(feature, N)¶ Initialize self. See help(type(self)) for accurate signature.
-
class
qlib.data.ops.
Min
(feature, N)¶ Rolling Min
Parameters: - feature (Expression) – feature instance
- N (int) – rolling window size
Returns: a feature instance with rolling min
Return type: -
__init__
(feature, N)¶ Initialize self. See help(type(self)) for accurate signature.
-
class
qlib.data.ops.
IdxMin
(feature, N)¶ Rolling Min Index
Parameters: - feature (Expression) – feature instance
- N (int) – rolling window size
Returns: a feature instance with rolling min index
Return type: -
__init__
(feature, N)¶ Initialize self. See help(type(self)) for accurate signature.
-
class
qlib.data.ops.
Quantile
(feature, N, qscore)¶ Rolling Quantile
Parameters: - feature (Expression) – feature instance
- N (int) – rolling window size
Returns: a feature instance with rolling quantile
Return type: -
__init__
(feature, N, qscore)¶ Initialize self. See help(type(self)) for accurate signature.
-
class
qlib.data.ops.
Med
(feature, N)¶ Rolling Median
Parameters: - feature (Expression) – feature instance
- N (int) – rolling window size
Returns: a feature instance with rolling median
Return type: -
__init__
(feature, N)¶ Initialize self. See help(type(self)) for accurate signature.
-
class
qlib.data.ops.
Mad
(feature, N)¶ Rolling Mean Absolute Deviation
Parameters: - feature (Expression) – feature instance
- N (int) – rolling window size
Returns: a feature instance with rolling mean absolute deviation
Return type: -
__init__
(feature, N)¶ Initialize self. See help(type(self)) for accurate signature.
-
class
qlib.data.ops.
Rank
(feature, N)¶ Rolling Rank (Percentile)
Parameters: - feature (Expression) – feature instance
- N (int) – rolling window size
Returns: a feature instance with rolling rank
Return type: -
__init__
(feature, N)¶ Initialize self. See help(type(self)) for accurate signature.
-
class
qlib.data.ops.
Count
(feature, N)¶ Rolling Count
Parameters: - feature (Expression) – feature instance
- N (int) – rolling window size
Returns: a feature instance with rolling count of number of non-NaN elements
Return type: -
__init__
(feature, N)¶ Initialize self. See help(type(self)) for accurate signature.
-
class
qlib.data.ops.
Delta
(feature, N)¶ Rolling Delta
Parameters: - feature (Expression) – feature instance
- N (int) – rolling window size
Returns: a feature instance with end minus start in rolling window
Return type: -
__init__
(feature, N)¶ Initialize self. See help(type(self)) for accurate signature.
-
class
qlib.data.ops.
Slope
(feature, N)¶ Rolling Slope
Parameters: - feature (Expression) – feature instance
- N (int) – rolling window size
Returns: a feature instance with linear regression slope of given window
Return type: -
__init__
(feature, N)¶ Initialize self. See help(type(self)) for accurate signature.
-
class
qlib.data.ops.
Rsquare
(feature, N)¶ Rolling R-value Square
Parameters: - feature (Expression) – feature instance
- N (int) – rolling window size
Returns: a feature instance with linear regression r-value square of given window
Return type: -
__init__
(feature, N)¶ Initialize self. See help(type(self)) for accurate signature.
-
class
qlib.data.ops.
Resi
(feature, N)¶ Rolling Regression Residuals
Parameters: - feature (Expression) – feature instance
- N (int) – rolling window size
Returns: a feature instance with regression residuals of given window
Return type: -
__init__
(feature, N)¶ Initialize self. See help(type(self)) for accurate signature.
-
class
qlib.data.ops.
WMA
(feature, N)¶ Rolling WMA
Parameters: - feature (Expression) – feature instance
- N (int) – rolling window size
Returns: a feature instance with weighted moving average output
Return type: -
__init__
(feature, N)¶ Initialize self. See help(type(self)) for accurate signature.
-
class
qlib.data.ops.
EMA
(feature, N)¶ Rolling Exponential Mean (EMA)
Parameters: - feature (Expression) – feature instance
- N (int, float) – rolling window size
Returns: a feature instance with regression r-value square of given window
Return type: -
__init__
(feature, N)¶ Initialize self. See help(type(self)) for accurate signature.
-
class
qlib.data.ops.
PairRolling
(feature_left, feature_right, N, func)¶ Pair Rolling Operator
Parameters: - feature_left (Expression) – feature instance
- feature_right (Expression) – feature instance
- N (int) – rolling window size
Returns: a feature instance with rolling output of two input features
Return type: -
__init__
(feature_left, feature_right, N, func)¶ Initialize self. See help(type(self)) for accurate signature.
-
get_longest_back_rolling
()¶ Get the longest length of historical data the feature has accessed
This is designed for getting the needed range of the data to calculate the features in specific range at first. However, situations like Ref(Ref($close, -1), 1) can not be handled rightly.
So this will only used for detecting the length of historical data needed.
-
get_extended_window_size
()¶ get_extend_window_size
For to calculate this Operator in range[start_index, end_index] We have to get the leaf feature in range[start_index - lft_etd, end_index + rght_etd].
Returns: lft_etd, rght_etd Return type: (int, int)
-
class
qlib.data.ops.
Corr
(feature_left, feature_right, N)¶ Rolling Correlation
Parameters: - feature_left (Expression) – feature instance
- feature_right (Expression) – feature instance
- N (int) – rolling window size
Returns: a feature instance with rolling correlation of two input features
Return type: -
__init__
(feature_left, feature_right, N)¶ Initialize self. See help(type(self)) for accurate signature.
-
class
qlib.data.ops.
Cov
(feature_left, feature_right, N)¶ Rolling Covariance
Parameters: - feature_left (Expression) – feature instance
- feature_right (Expression) – feature instance
- N (int) – rolling window size
Returns: a feature instance with rolling max of two input features
Return type: -
__init__
(feature_left, feature_right, N)¶ Initialize self. See help(type(self)) for accurate signature.
-
class
qlib.data.ops.
OpsWrapper
¶ Ops Wrapper
-
__init__
()¶ Initialize self. See help(type(self)) for accurate signature.
-
register
(ops_list: List[Union[Type[qlib.data.base.ExpressionOps], dict]])¶ register operator
Parameters: ops_list (List[Union[Type[ExpressionOps], dict]]) – - if type(ops_list) is List[Type[ExpressionOps]], each element of ops_list represents the operator class, which should be the subclass of ExpressionOps.
- if type(ops_list) is List[dict], each element of ops_list represents the config of operator, which has the following format:
- {
- “class”: class_name, “module_path”: path,
} Note: class should be the class name of operator, module_path should be a python module or path of file.
-
-
qlib.data.ops.
register_all_ops
(C)¶ register all operator
Cache¶
-
class
qlib.data.cache.
MemCacheUnit
(*args, **kwargs)¶ Memory Cache Unit.
-
__init__
(*args, **kwargs)¶ Initialize self. See help(type(self)) for accurate signature.
-
limited
¶ whether memory cache is limited
-
-
class
qlib.data.cache.
MemCache
(mem_cache_size_limit=None, limit_type='length')¶ Memory cache.
-
__init__
(mem_cache_size_limit=None, limit_type='length')¶ Parameters: - mem_cache_size_limit (cache max size.) –
- limit_type (length or sizeof; length(call fun: len), size(call fun: sys.getsizeof)) –
-
-
class
qlib.data.cache.
ExpressionCache
(provider)¶ Expression cache mechanism base class.
This class is used to wrap expression provider with self-defined expression cache mechanism.
Note
Override the _uri and _expression method to create your own expression cache mechanism.
-
expression
(instrument, field, start_time, end_time, freq)¶ Get expression data.
Note
Same interface as expression method in expression provider
-
update
(cache_uri: Union[str, pathlib.Path], freq: str = 'day')¶ Update expression cache to latest calendar.
Overide this method to define how to update expression cache corresponding to users’ own cache mechanism.
Parameters: - cache_uri (str or Path) – the complete uri of expression cache file (include dir path).
- freq (str) –
Returns: 0(successful update)/ 1(no need to update)/ 2(update failure).
Return type: int
-
-
class
qlib.data.cache.
DatasetCache
(provider)¶ Dataset cache mechanism base class.
This class is used to wrap dataset provider with self-defined dataset cache mechanism.
Note
Override the _uri and _dataset method to create your own dataset cache mechanism.
-
dataset
(instruments, fields, start_time=None, end_time=None, freq='day', disk_cache=1, inst_processors=[])¶ Get feature dataset.
Note
Same interface as dataset method in dataset provider
Note
The server use redis_lock to make sure read-write conflicts will not be triggered
but client readers are not considered.
-
update
(cache_uri: Union[str, pathlib.Path], freq: str = 'day')¶ Update dataset cache to latest calendar.
Overide this method to define how to update dataset cache corresponding to users’ own cache mechanism.
Parameters: - cache_uri (str or Path) – the complete uri of dataset cache file (include dir path).
- freq (str) –
Returns: 0(successful update)/ 1(no need to update)/ 2(update failure)
Return type: int
-
static
cache_to_origin_data
(data, fields)¶ cache data to origin data
Parameters: - data – pd.DataFrame, cache data.
- fields – feature fields.
Returns: pd.DataFrame.
-
static
normalize_uri_args
(instruments, fields, freq)¶ normalize uri args
-
-
class
qlib.data.cache.
DiskExpressionCache
(provider, **kwargs)¶ Prepared cache mechanism for server.
-
__init__
(provider, **kwargs)¶ Initialize self. See help(type(self)) for accurate signature.
-
gen_expression_cache
(expression_data, cache_path, instrument, field, freq, last_update)¶ use bin file to save like feature-data.
-
update
(sid, cache_uri, freq: str = 'day')¶ Update expression cache to latest calendar.
Overide this method to define how to update expression cache corresponding to users’ own cache mechanism.
Parameters: - cache_uri (str or Path) – the complete uri of expression cache file (include dir path).
- freq (str) –
Returns: 0(successful update)/ 1(no need to update)/ 2(update failure).
Return type: int
-
-
class
qlib.data.cache.
DiskDatasetCache
(provider, **kwargs)¶ Prepared cache mechanism for server.
-
__init__
(provider, **kwargs)¶ Initialize self. See help(type(self)) for accurate signature.
-
classmethod
read_data_from_cache
(cache_path: Union[str, pathlib.Path], start_time, end_time, fields)¶ read_cache_from
This function can read data from the disk cache dataset
Parameters: - cache_path –
- start_time –
- end_time –
- fields – The fields order of the dataset cache is sorted. So rearrange the columns to make it consistent.
Returns:
-
class
IndexManager
(cache_path: Union[str, pathlib.Path])¶ The lock is not considered in the class. Please consider the lock outside the code. This class is the proxy of the disk data.
-
__init__
(cache_path: Union[str, pathlib.Path])¶ Initialize self. See help(type(self)) for accurate signature.
-
-
gen_dataset_cache
(cache_path: Union[str, pathlib.Path], instruments, fields, freq, inst_processors=[])¶ Note
This function does not consider the cache read write lock. Please
Aquire the lock outside this function
The format the cache contains 3 parts(followed by typical filename).
index : cache/d41366901e25de3ec47297f12e2ba11d.index
The content of the file may be in following format(pandas.Series)
start end 1999-11-10 00:00:00 0 1 1999-11-11 00:00:00 1 2 1999-11-12 00:00:00 2 3 ...
Note
The start is closed. The end is open!!!!!
- Each line contains two element <start_index, end_index> with a timestamp as its index.
- It indicates the start_index`(included) and `end_index`(excluded) of the data for `timestamp
meta data: cache/d41366901e25de3ec47297f12e2ba11d.meta
data : cache/d41366901e25de3ec47297f12e2ba11d
- This is a hdf file sorted by datetime
Parameters: - cache_path – The path to store the cache.
- instruments – The instruments to store the cache.
- fields – The fields to store the cache.
- freq – The freq to store the cache.
- inst_processors – Instrument processors.
:return type pd.DataFrame; The fields of the returned DataFrame are consistent with the parameters of the function.
-
update
(cache_uri, freq: str = 'day')¶ Update dataset cache to latest calendar.
Overide this method to define how to update dataset cache corresponding to users’ own cache mechanism.
Parameters: - cache_uri (str or Path) – the complete uri of dataset cache file (include dir path).
- freq (str) –
Returns: 0(successful update)/ 1(no need to update)/ 2(update failure)
Return type: int
-
Storage¶
-
class
qlib.data.storage.storage.
BaseStorage
¶
-
class
qlib.data.storage.storage.
CalendarStorage
(freq: str, future: bool, **kwargs)¶ The behavior of CalendarStorage’s methods and List’s methods of the same name remain consistent
-
__init__
(freq: str, future: bool, **kwargs)¶ Initialize self. See help(type(self)) for accurate signature.
-
data
¶ get all data
Raises: ValueError
– If the data(storage) does not exist, raise ValueError
-
index
(value: str) → int¶ Raises: ValueError
– If the data(storage) does not exist, raise ValueError
-
-
class
qlib.data.storage.storage.
InstrumentStorage
(market: str, **kwargs)¶ -
__init__
(market: str, **kwargs)¶ Initialize self. See help(type(self)) for accurate signature.
-
data
¶ get all data
Raises: ValueError
– If the data(storage) does not exist, raise ValueError
-
update
([E, ]**F) → None. Update D from mapping/iterable E and F.¶ Notes
If E present and has a .keys() method, does: for k in E: D[k] = E[k]
If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v
In either case, this is followed by: for k, v in F.items(): D[k] = v
-
-
class
qlib.data.storage.storage.
FeatureStorage
(instrument: str, field: str, freq: str, **kwargs)¶ -
__init__
(instrument: str, field: str, freq: str, **kwargs)¶ Initialize self. See help(type(self)) for accurate signature.
-
data
¶ get all data
Notes
if data(storage) does not exist, return empty pd.Series: return pd.Series(dtype=np.float32)
-
start_index
¶ get FeatureStorage start index
Notes
If the data(storage) does not exist, return None
-
end_index
¶ get FeatureStorage end index
Notes
The right index of the data range (both sides are closed)
The next data appending point will be end_index + 1If the data(storage) does not exist, return None
-
write
(data_array: Union[List[T], numpy.ndarray, Tuple], index: int = None)¶ Write data_array to FeatureStorage starting from index.
Notes
If index is None, append data_array to feature.
If len(data_array) == 0; return
If (index - self.end_index) >= 1, self[end_index+1: index] will be filled with np.nan
Examples
-
rebase
(start_index: int = None, end_index: int = None)¶ Rebase the start_index and end_index of the FeatureStorage.
start_index and end_index are closed intervals: [start_index, end_index]
Examples
-
rewrite
(data: Union[List[T], numpy.ndarray, Tuple], index: int)¶ overwrite all data in FeatureStorage with data
Parameters: - data (Union[List, np.ndarray, Tuple]) – data
- index (int) – data start index
-
-
class
qlib.data.storage.file_storage.
FileCalendarStorage
(freq: str, future: bool, **kwargs)¶ -
__init__
(freq: str, future: bool, **kwargs)¶ Initialize self. See help(type(self)) for accurate signature.
-
data
¶ get all data
Raises: ValueError
– If the data(storage) does not exist, raise ValueError
-
index
(value: str) → int¶ Raises: ValueError
– If the data(storage) does not exist, raise ValueError
-
-
class
qlib.data.storage.file_storage.
FileInstrumentStorage
(market: str, **kwargs)¶ -
__init__
(market: str, **kwargs)¶ Initialize self. See help(type(self)) for accurate signature.
-
data
¶ get all data
Raises: ValueError
– If the data(storage) does not exist, raise ValueError
-
update
([E, ]**F) → None. Update D from mapping/iterable E and F.¶ Notes
If E present and has a .keys() method, does: for k in E: D[k] = E[k]
If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v
In either case, this is followed by: for k, v in F.items(): D[k] = v
-
-
class
qlib.data.storage.file_storage.
FileFeatureStorage
(instrument: str, field: str, freq: str, **kwargs)¶ -
__init__
(instrument: str, field: str, freq: str, **kwargs)¶ Initialize self. See help(type(self)) for accurate signature.
-
data
¶ get all data
Notes
if data(storage) does not exist, return empty pd.Series: return pd.Series(dtype=np.float32)
-
write
(data_array: Union[List[T], numpy.ndarray], index: int = None) → None¶ Write data_array to FeatureStorage starting from index.
Notes
If index is None, append data_array to feature.
If len(data_array) == 0; return
If (index - self.end_index) >= 1, self[end_index+1: index] will be filled with np.nan
Examples
-
start_index
¶ get FeatureStorage start index
Notes
If the data(storage) does not exist, return None
-
end_index
¶ get FeatureStorage end index
Notes
The right index of the data range (both sides are closed)
The next data appending point will be end_index + 1If the data(storage) does not exist, return None
-
Dataset¶
Dataset Class¶
-
class
qlib.data.dataset.__init__.
Dataset
(**kwargs)¶ Preparing data for model training and inferencing.
-
__init__
(**kwargs)¶ init is designed to finish following steps:
- init the sub instance and the state of the dataset(info to prepare the data)
- The name of essential state for preparing data should not start with ‘_’ so that it could be serialized on disk when serializing.
- setup data
- The data related attributes’ names should start with ‘_’ so that it will not be saved on disk when serializing.
The data could specify the info to calculate the essential data for preparation
-
config
(**kwargs)¶ config is designed to configure and parameters that cannot be learned from the data
-
setup_data
(**kwargs)¶ Setup the data.
We split the setup_data function for following situation:
- User have a Dataset object with learned status on disk.
- User load the Dataset object from the disk.
- User call setup_data to load new data.
- User prepare data for model based on previous status.
-
prepare
(**kwargs) → object¶ The type of dataset depends on the model. (It could be pd.DataFrame, pytorch.DataLoader, etc.) The parameters should specify the scope for the prepared data The method should: - process the data
- return the processed data
Returns: return the object Return type: object
-
-
class
qlib.data.dataset.__init__.
DatasetH
(handler: Union[Dict[KT, VT], qlib.data.dataset.handler.DataHandler], segments: Dict[str, Tuple], **kwargs)¶ Dataset with Data(H)andler
User should try to put the data preprocessing functions into handler. Only following data processing functions should be placed in Dataset:
- The processing is related to specific model.
- The processing is related to data split.
-
__init__
(handler: Union[Dict[KT, VT], qlib.data.dataset.handler.DataHandler], segments: Dict[str, Tuple], **kwargs)¶ Setup the underlying data.
Parameters: - handler (Union[dict, DataHandler]) –
handler could be:
- instance of DataHandler
- config of DataHandler. Please refer to DataHandler
- segments (dict) – Describe the options to segment the data. Here are some examples:
- handler (Union[dict, DataHandler]) –
-
config
(handler_kwargs: dict = None, **kwargs)¶ Initialize the DatasetH
Parameters: - handler_kwargs (dict) –
Config of DataHandler, which could include the following arguments:
- arguments of DataHandler.conf_data, such as ‘instruments’, ‘start_time’ and ‘end_time’.
- kwargs (dict) –
Config of DatasetH, such as
- segments : dict
- Config of segments which is same as ‘segments’ in self.__init__
- handler_kwargs (dict) –
-
setup_data
(handler_kwargs: dict = None, **kwargs)¶ Setup the Data
Parameters: handler_kwargs (dict) – init arguments of DataHandler, which could include the following arguments:
- init_type : Init Type of Handler
- enable_cache : whether to enable cache
-
prepare
(segments: Union[List[str], Tuple[str], str, slice], col_set='__all', data_key='infer', **kwargs) → Union[List[pandas.core.frame.DataFrame], pandas.core.frame.DataFrame]¶ Prepare the data for learning and inference.
Parameters: - segments (Union[List[Text], Tuple[Text], Text, slice]) –
Describe the scope of the data to be prepared Here are some examples:
- ’train’
- [‘train’, ‘valid’]
- col_set (str) – The col_set will be passed to self.handler when fetching data.
- data_key (str) – The data to fetch: DK_* Default is DK_I, which indicate fetching data for inference.
- kwargs –
- The parameters that kwargs may contain:
- flt_col : str
- It only exists in TSDatasetH, can be used to add a column of data(True or False) to filter data. This parameter is only supported when it is an instance of TSDatasetH.
Returns: Return type: Union[List[pd.DataFrame], pd.DataFrame]
Raises: NotImplementedError:
- segments (Union[List[Text], Tuple[Text], Text, slice]) –
-
class
qlib.data.dataset.__init__.
TSDataSampler
(data: pandas.core.frame.DataFrame, start, end, step_len: int, fillna_type: str = 'none', dtype=None, flt_data=None)¶ (T)ime-(S)eries DataSampler This is the result of TSDatasetH
It works like torch.data.utils.Dataset, it provides a very convenient interface for constructing time-series dataset based on tabular data. - On time step dimension, the smaller index indicates the historical data and the larger index indicates the future
data.If user have further requirements for processing data, user could process them based on TSDataSampler or create more powerful subclasses.
Known Issues: - For performance issues, this Sampler will convert dataframe into arrays for better performance. This could result
in a different data type-
__init__
(data: pandas.core.frame.DataFrame, start, end, step_len: int, fillna_type: str = 'none', dtype=None, flt_data=None)¶ Build a dataset which looks like torch.data.utils.Dataset.
Parameters: - data (pd.DataFrame) – The raw tabular data
- start – The indexable start time
- end – The indexable end time
- step_len (int) – The length of the time-series step
- fillna_type (int) –
How will qlib handle the sample if there is on sample in a specific date. none:
fill with np.nan- ffill:
- ffill with previous sample
- ffill+bfill:
- ffill with previous samples first and fill with later samples second
- flt_data (pd.Series) –
a column of data(True or False) to filter data. None:
kepp all data
-
get_index
()¶ Get the pandas index of the data, it will be useful in following scenarios - Special sampler will be used (e.g. user want to sample day by day)
-
static
build_index
(data: pandas.core.frame.DataFrame) → Tuple[pandas.core.frame.DataFrame, dict]¶ The relation of the data
Parameters: data (pd.DataFrame) – The dataframe with <datetime, DataFrame> Returns: - the first element: reshape the original index into a <datetime(row), instrument(column)> 2D dataframe
- instrument SH600000 SH600004 SH600006 SH600007 SH600008 SH600009 … datetime 2021-01-11 0 1 2 3 4 5 … 2021-01-12 4146 4147 4148 4149 4150 4151 … 2021-01-13 8293 8294 8295 8296 8297 8298 … 2021-01-14 12441 12442 12443 12444 12445 12446 …
- the second element: {<original index>: <row, col>}
Return type: Tuple[pd.DataFrame, dict]
-
-
class
qlib.data.dataset.__init__.
TSDatasetH
(step_len=30, **kwargs)¶ (T)ime-(S)eries Dataset (H)andler
Convert the tabular data to Time-Series data
Requirements analysis
The typical workflow of a user to get time-series data for an sample - process features - slice proper data from data handler: dimension of sample <feature, > - Build relation of samples by <time, instrument> index
- Be able to sample times series of data <timestep, feature>
- It will be better if the interface is like “torch.utils.data.Dataset”
- User could build customized batch based on the data
- The dimension of a batch of data <batch_idx, feature, timestep>
-
__init__
(step_len=30, **kwargs)¶ Setup the underlying data.
Parameters: - handler (Union[dict, DataHandler]) –
handler could be:
- instance of DataHandler
- config of DataHandler. Please refer to DataHandler
- segments (dict) – Describe the options to segment the data. Here are some examples:
- handler (Union[dict, DataHandler]) –
-
config
(**kwargs)¶ Initialize the DatasetH
Parameters: - handler_kwargs (dict) –
Config of DataHandler, which could include the following arguments:
- arguments of DataHandler.conf_data, such as ‘instruments’, ‘start_time’ and ‘end_time’.
- kwargs (dict) –
Config of DatasetH, such as
- segments : dict
- Config of segments which is same as ‘segments’ in self.__init__
- handler_kwargs (dict) –
-
setup_data
(**kwargs)¶ Setup the Data
Parameters: handler_kwargs (dict) – init arguments of DataHandler, which could include the following arguments:
- init_type : Init Type of Handler
- enable_cache : whether to enable cache
Data Loader¶
-
class
qlib.data.dataset.loader.
DataLoader
¶ DataLoader is designed for loading raw data from original data source.
-
load
(instruments, start_time=None, end_time=None) → pandas.core.frame.DataFrame¶ load the data as pd.DataFrame.
Example of the data (The multi-index of the columns is optional.):
feature label $close $volume Ref($close, 1) Mean($close, 3) $high-$low LABEL0 datetime instrument 2010-01-04 SH600000 81.807068 17145150.0 83.737389 83.016739 2.741058 0.0032 SH600004 13.313329 11800983.0 13.313329 13.317701 0.183632 0.0042 SH600005 37.796539 12231662.0 38.258602 37.919757 0.970325 0.0289
Parameters: - instruments (str or dict) – it can either be the market name or the config file of instruments generated by InstrumentProvider.
- start_time (str) – start of the time range.
- end_time (str) – end of the time range.
Returns: data load from the under layer source
Return type: pd.DataFrame
-
-
class
qlib.data.dataset.loader.
DLWParser
(config: Union[list, tuple, dict])¶ (D)ata(L)oader (W)ith (P)arser for features and names
Extracting this class so that QlibDataLoader and other dataloaders(such as QdbDataLoader) can share the fields.
-
__init__
(config: Union[list, tuple, dict])¶ Parameters: config (Union[list, tuple, dict]) – Config will be used to describe the fields and column names
-
load_group_df
(instruments, exprs: list, names: list, start_time: Union[str, pandas._libs.tslibs.timestamps.Timestamp] = None, end_time: Union[str, pandas._libs.tslibs.timestamps.Timestamp] = None, gp_name: str = None) → pandas.core.frame.DataFrame¶ load the dataframe for specific group
Parameters: - instruments – the instruments.
- exprs (list) – the expressions to describe the content of the data.
- names (list) – the name of the data.
Returns: the queried dataframe.
Return type: pd.DataFrame
-
load
(instruments=None, start_time=None, end_time=None) → pandas.core.frame.DataFrame¶ load the data as pd.DataFrame.
Example of the data (The multi-index of the columns is optional.):
feature label $close $volume Ref($close, 1) Mean($close, 3) $high-$low LABEL0 datetime instrument 2010-01-04 SH600000 81.807068 17145150.0 83.737389 83.016739 2.741058 0.0032 SH600004 13.313329 11800983.0 13.313329 13.317701 0.183632 0.0042 SH600005 37.796539 12231662.0 38.258602 37.919757 0.970325 0.0289
Parameters: - instruments (str or dict) – it can either be the market name or the config file of instruments generated by InstrumentProvider.
- start_time (str) – start of the time range.
- end_time (str) – end of the time range.
Returns: data load from the under layer source
Return type: pd.DataFrame
-
-
class
qlib.data.dataset.loader.
QlibDataLoader
(config: Tuple[list, tuple, dict], filter_pipe: List[T] = None, swap_level: bool = True, freq: Union[str, dict] = 'day', inst_processor: dict = None)¶ Same as QlibDataLoader. The fields can be define by config
-
__init__
(config: Tuple[list, tuple, dict], filter_pipe: List[T] = None, swap_level: bool = True, freq: Union[str, dict] = 'day', inst_processor: dict = None)¶ Parameters: - config (Tuple[list, tuple, dict]) – Please refer to the doc of DLWParser
- filter_pipe – Filter pipe for the instruments
- swap_level – Whether to swap level of MultiIndex
- freq (dict or str) – If type(config) == dict and type(freq) == str, load config data using freq. If type(config) == dict and type(freq) == dict, load config[<group_name>] data using freq[<group_name>]
- inst_processor (dict) – If inst_processor is not None and type(config) == dict; load config[<group_name>] data using inst_processor[<group_name>]
-
load_group_df
(instruments, exprs: list, names: list, start_time: Union[str, pandas._libs.tslibs.timestamps.Timestamp] = None, end_time: Union[str, pandas._libs.tslibs.timestamps.Timestamp] = None, gp_name: str = None) → pandas.core.frame.DataFrame¶ load the dataframe for specific group
Parameters: - instruments – the instruments.
- exprs (list) – the expressions to describe the content of the data.
- names (list) – the name of the data.
Returns: the queried dataframe.
Return type: pd.DataFrame
-
-
class
qlib.data.dataset.loader.
StaticDataLoader
(config: dict, join='outer')¶ DataLoader that supports loading data from file or as provided.
-
__init__
(config: dict, join='outer')¶ Parameters: - config (dict) – {fields_group: <path or object>}
- join (str) – How to align different dataframes
-
load
(instruments=None, start_time=None, end_time=None) → pandas.core.frame.DataFrame¶ load the data as pd.DataFrame.
Example of the data (The multi-index of the columns is optional.):
feature label $close $volume Ref($close, 1) Mean($close, 3) $high-$low LABEL0 datetime instrument 2010-01-04 SH600000 81.807068 17145150.0 83.737389 83.016739 2.741058 0.0032 SH600004 13.313329 11800983.0 13.313329 13.317701 0.183632 0.0042 SH600005 37.796539 12231662.0 38.258602 37.919757 0.970325 0.0289
Parameters: - instruments (str or dict) – it can either be the market name or the config file of instruments generated by InstrumentProvider.
- start_time (str) – start of the time range.
- end_time (str) – end of the time range.
Returns: data load from the under layer source
Return type: pd.DataFrame
-
-
class
qlib.data.dataset.loader.
DataLoaderDH
(handler_config: dict, fetch_kwargs: dict = {}, is_group=False)¶ DataLoader based on (D)ata (H)andler It is designed to load multiple data from data handler - If you just want to load data from single datahandler, you can write them in single data handler
TODO: What make this module not that easy to use. - For online scenario
- The underlayer data handler should be configured. But data loader doesn’t provide such interface & hook.
-
__init__
(handler_config: dict, fetch_kwargs: dict = {}, is_group=False)¶ Parameters: - handler_config (dict) – handler_config will be used to describe the handlers
- fetch_kwargs (dict) – fetch_kwargs will be used to describe the different arguments of fetch method, such as col_set, squeeze, data_key, etc.
- is_group (bool) – is_group will be used to describe whether the key of handler_config is group
-
load
(instruments=None, start_time=None, end_time=None) → pandas.core.frame.DataFrame¶ load the data as pd.DataFrame.
Example of the data (The multi-index of the columns is optional.):
feature label $close $volume Ref($close, 1) Mean($close, 3) $high-$low LABEL0 datetime instrument 2010-01-04 SH600000 81.807068 17145150.0 83.737389 83.016739 2.741058 0.0032 SH600004 13.313329 11800983.0 13.313329 13.317701 0.183632 0.0042 SH600005 37.796539 12231662.0 38.258602 37.919757 0.970325 0.0289
Parameters: - instruments (str or dict) – it can either be the market name or the config file of instruments generated by InstrumentProvider.
- start_time (str) – start of the time range.
- end_time (str) – end of the time range.
Returns: data load from the under layer source
Return type: pd.DataFrame
Data Handler¶
-
class
qlib.data.dataset.handler.
DataHandler
(instruments=None, start_time=None, end_time=None, data_loader: Union[dict, str, qlib.data.dataset.loader.DataLoader] = None, init_data=True, fetch_orig=True)¶ The steps to using a handler 1. initialized data handler (call by init). 2. use the data.
The data handler try to maintain a handler with 2 level. datetime & instruments.
Any order of the index level can be supported (The order will be implied in the data). The order <datetime, instruments> will be used when the dataframe index name is missed.
Example of the data: The multi-index of the columns is optional.
feature label $close $volume Ref($close, 1) Mean($close, 3) $high-$low LABEL0 datetime instrument 2010-01-04 SH600000 81.807068 17145150.0 83.737389 83.016739 2.741058 0.0032 SH600004 13.313329 11800983.0 13.313329 13.317701 0.183632 0.0042 SH600005 37.796539 12231662.0 38.258602 37.919757 0.970325 0.0289
Tips for improving the performance of datahandler - Fetching data with col_set=CS_RAW will return the raw data and may avoid pandas from copying the data when calling loc
-
__init__
(instruments=None, start_time=None, end_time=None, data_loader: Union[dict, str, qlib.data.dataset.loader.DataLoader] = None, init_data=True, fetch_orig=True)¶ Parameters: - instruments – The stock list to retrive.
- start_time – start_time of the original data.
- end_time – end_time of the original data.
- data_loader (Union[dict, str, DataLoader]) – data loader to load the data.
- init_data – initialize the original data in the constructor.
- fetch_orig (bool) – Return the original data instead of copy if possible.
-
config
(**kwargs)¶ configuration of data. # what data to be loaded from data source
This method will be used when loading pickled handler from dataset. The data will be initialized with different time range.
-
setup_data
(enable_cache: bool = False)¶ Set Up the data in case of running initialization for multiple time
It is responsible for maintaining following variable 1) self._data
Parameters: enable_cache (bool) – default value is false:
- if enable_cache == True:the processed data will be saved on disk, and handler will load the cached data from the disk directly when we call init next time
- if enable_cache == True:
-
fetch
(selector: Union[pandas._libs.tslibs.timestamps.Timestamp, slice, str] = slice(None, None, None), level: Union[str, int] = 'datetime', col_set: Union[str, List[str]] = '__all', squeeze: bool = False, proc_func: Callable = None) → pandas.core.frame.DataFrame¶ fetch data from underlying data source
Parameters: - selector (Union[pd.Timestamp, slice, str]) – describe how to select data by index
- level (Union[str, int]) – which index level to select the data
- col_set (Union[str, List[str]]) –
- if isinstance(col_set, str):select a set of meaningful columns.(e.g. features, columns)
- if cal_set == CS_RAW:
- the raw dataset will be returned.
- if isinstance(col_set, List[str]):select several sets of meaningful columns, the returned data has multiple levels
- if isinstance(col_set, str):
- proc_func (Callable) –
- Give a hook for processing data before fetching
- An example to explain the necessity of the hook:
- A Dataset learned some processors to process data which is related to data segmentation
- It will apply them every time when preparing data.
- The learned processor require the dataframe remains the same format when fitting and applying
- However the data format will change according to the parameters.
- So the processors should be applied to the underlayer data.
- squeeze (bool) – whether squeeze columns and index
Returns: Return type: pd.DataFrame.
-
get_cols
(col_set='__all') → list¶ get the column names
Parameters: col_set (str) – select a set of meaningful columns.(e.g. features, columns) Returns: list of column names Return type: list
-
get_range_selector
(cur_date: Union[pandas._libs.tslibs.timestamps.Timestamp, str], periods: int) → slice¶ get range selector by number of periods
Parameters: - cur_date (pd.Timestamp or str) – current date
- periods (int) – number of periods
-
get_range_iterator
(periods: int, min_periods: Optional[int] = None, **kwargs) → Iterator[Tuple[pandas._libs.tslibs.timestamps.Timestamp, pandas.core.frame.DataFrame]]¶ get a iterator of sliced data with given periods
Parameters: - periods (int) – number of periods.
- min_periods (int) – minimum periods for sliced dataframe.
- kwargs (dict) – will be passed to self.fetch.
-
-
class
qlib.data.dataset.handler.
DataHandlerLP
(instruments=None, start_time=None, end_time=None, data_loader: Union[dict, str, qlib.data.dataset.loader.DataLoader] = None, infer_processors: List[T] = [], learn_processors: List[T] = [], shared_processors: List[T] = [], process_type='append', drop_raw=False, **kwargs)¶ DataHandler with (L)earnable (P)rocessor
Tips to improving the performance of data handler - To reduce the memory cost
- drop_raw=True: this will modify the data inplace on raw data;
-
__init__
(instruments=None, start_time=None, end_time=None, data_loader: Union[dict, str, qlib.data.dataset.loader.DataLoader] = None, infer_processors: List[T] = [], learn_processors: List[T] = [], shared_processors: List[T] = [], process_type='append', drop_raw=False, **kwargs)¶ Parameters: - infer_processors (list) –
- list of <description info> of processors to generate data for inference
- example of <description info>:
- learn_processors (list) – similar to infer_processors, but for generating data for learning models
- process_type (str) –
PTYPE_I = ‘independent’
- self._infer will processed by infer_processors
- self._learn will be processed by learn_processors
PTYPE_A = ‘append’
- self._infer will processed by infer_processors
- self._learn will be processed by infer_processors + learn_processors
- (e.g. self._infer processed by learn_processors )
- drop_raw (bool) – Whether to drop the raw data
- infer_processors (list) –
-
fit
()¶ fit data without processing the data
-
fit_process_data
()¶ fit and process data
The input of the fit will be the output of the previous processor
-
process_data
(with_fit: bool = False)¶ process_data data. Fun processor.fit if necessary
Notation: (data) [processor]
# data processing flow of self.process_type == DataHandlerLP.PTYPE_I (self._data)-[shared_processors]-(_shared_df)-[learn_processors]-(_learn_df)
-[infer_processors]-(_infer_df)# data processing flow of self.process_type == DataHandlerLP.PTYPE_A (self._data)-[shared_processors]-(_shared_df)-[infer_processors]-(_infer_df)-[learn_processors]-(_learn_df)
Parameters: with_fit (bool) – The input of the fit will be the output of the previous processor
-
config
(processor_kwargs: dict = None, **kwargs)¶ configuration of data. # what data to be loaded from data source
This method will be used when loading pickled handler from dataset. The data will be initialized with different time range.
-
setup_data
(init_type: str = 'fit_seq', **kwargs)¶ Set up the data in case of running initialization for multiple time
Parameters: - init_type (str) – The type IT_* listed above.
- enable_cache (bool) –
default value is false:
- if enable_cache == True:the processed data will be saved on disk, and handler will load the cached data from the disk directly when we call init next time
- if enable_cache == True:
-
fetch
(selector: Union[pandas._libs.tslibs.timestamps.Timestamp, slice, str] = slice(None, None, None), level: Union[str, int] = 'datetime', col_set='__all', data_key: str = 'infer', proc_func: Callable = None) → pandas.core.frame.DataFrame¶ fetch data from underlying data source
Parameters: - selector (Union[pd.Timestamp, slice, str]) – describe how to select data by index.
- level (Union[str, int]) – which index level to select the data.
- col_set (str) – select a set of meaningful columns.(e.g. features, columns).
- data_key (str) – the data to fetch: DK_*.
- proc_func (Callable) – please refer to the doc of DataHandler.fetch
Returns: Return type: pd.DataFrame
-
get_cols
(col_set='__all', data_key: str = 'infer') → list¶ get the column names
Parameters: - col_set (str) – select a set of meaningful columns.(e.g. features, columns).
- data_key (str) – the data to fetch: DK_*.
Returns: list of column names
Return type: list
Processor¶
-
qlib.data.dataset.processor.
get_group_columns
(df: pandas.core.frame.DataFrame, group: Optional[str])¶ get a group of columns from multi-index columns DataFrame
Parameters: - df (pd.DataFrame) – with multi of columns.
- group (str) – the name of the feature group, i.e. the first level value of the group index.
-
class
qlib.data.dataset.processor.
Processor
¶ -
fit
(df: pandas.core.frame.DataFrame = None)¶ learn data processing parameters
Parameters: df (pd.DataFrame) – When we fit and process data with processor one by one. The fit function reiles on the output of previous processor, i.e. df.
-
is_for_infer
() → bool¶ Is this processor usable for inference Some processors are not usable for inference.
Returns: if it is usable for infenrece. Return type: bool
-
readonly
() → bool¶ Does the processor treat the input data readonly (i.e. does not write the input data) when processsing
Knowning the readonly information is helpful to the Handler to avoid uncessary copy
-
config
(**kwargs)¶ configure the serializable object
Parameters: - dump_all (bool) – will the object dump all object
- exclude (list) – What attribute will not be dumped
- recursive (bool) – will the configuration be recursive
-
-
class
qlib.data.dataset.processor.
DropnaProcessor
(fields_group=None)¶ -
__init__
(fields_group=None)¶ Initialize self. See help(type(self)) for accurate signature.
-
readonly
()¶ Does the processor treat the input data readonly (i.e. does not write the input data) when processsing
Knowning the readonly information is helpful to the Handler to avoid uncessary copy
-
-
class
qlib.data.dataset.processor.
DropnaLabel
(fields_group='label')¶ -
__init__
(fields_group='label')¶ Initialize self. See help(type(self)) for accurate signature.
-
is_for_infer
() → bool¶ The samples are dropped according to label. So it is not usable for inference
-
-
class
qlib.data.dataset.processor.
DropCol
(col_list=[])¶ -
__init__
(col_list=[])¶ Initialize self. See help(type(self)) for accurate signature.
-
readonly
()¶ Does the processor treat the input data readonly (i.e. does not write the input data) when processsing
Knowning the readonly information is helpful to the Handler to avoid uncessary copy
-
-
class
qlib.data.dataset.processor.
FilterCol
(fields_group='feature', col_list=[])¶ -
__init__
(fields_group='feature', col_list=[])¶ Initialize self. See help(type(self)) for accurate signature.
-
readonly
()¶ Does the processor treat the input data readonly (i.e. does not write the input data) when processsing
Knowning the readonly information is helpful to the Handler to avoid uncessary copy
-
-
class
qlib.data.dataset.processor.
TanhProcess
¶ Use tanh to process noise data
-
class
qlib.data.dataset.processor.
ProcessInf
¶ Process infinity
-
class
qlib.data.dataset.processor.
Fillna
(fields_group=None, fill_value=0)¶ Process NaN
-
__init__
(fields_group=None, fill_value=0)¶ Initialize self. See help(type(self)) for accurate signature.
-
-
class
qlib.data.dataset.processor.
MinMaxNorm
(fit_start_time, fit_end_time, fields_group=None)¶ -
__init__
(fit_start_time, fit_end_time, fields_group=None)¶ Initialize self. See help(type(self)) for accurate signature.
-
fit
(df)¶ learn data processing parameters
Parameters: df (pd.DataFrame) – When we fit and process data with processor one by one. The fit function reiles on the output of previous processor, i.e. df.
-
-
class
qlib.data.dataset.processor.
ZScoreNorm
(fit_start_time, fit_end_time, fields_group=None)¶ ZScore Normalization
-
__init__
(fit_start_time, fit_end_time, fields_group=None)¶ Initialize self. See help(type(self)) for accurate signature.
-
fit
(df)¶ learn data processing parameters
Parameters: df (pd.DataFrame) – When we fit and process data with processor one by one. The fit function reiles on the output of previous processor, i.e. df.
-
-
class
qlib.data.dataset.processor.
RobustZScoreNorm
(fit_start_time, fit_end_time, fields_group=None, clip_outlier=True)¶ Robust ZScore Normalization
- Use robust statistics for Z-Score normalization:
- mean(x) = median(x) std(x) = MAD(x) * 1.4826
- Reference:
- https://en.wikipedia.org/wiki/Median_absolute_deviation.
-
__init__
(fit_start_time, fit_end_time, fields_group=None, clip_outlier=True)¶ Initialize self. See help(type(self)) for accurate signature.
-
fit
(df)¶ learn data processing parameters
Parameters: df (pd.DataFrame) – When we fit and process data with processor one by one. The fit function reiles on the output of previous processor, i.e. df.
-
class
qlib.data.dataset.processor.
CSZScoreNorm
(fields_group=None)¶ Cross Sectional ZScore Normalization
-
__init__
(fields_group=None)¶ Initialize self. See help(type(self)) for accurate signature.
-
Contrib¶
Model¶
-
class
qlib.model.base.
BaseModel
¶ Modeling things
-
predict
(*args, **kwargs) → object¶ Make predictions after modeling things
-
-
class
qlib.model.base.
Model
¶ Learnable Models
-
fit
(dataset: qlib.data.dataset.Dataset)¶ Learn model from the base model
Note
The attribute names of learned model should not start with ‘_’. So that the model could be dumped to disk.
The following code example shows how to retrieve x_train, y_train and w_train from the dataset:
# get features and labels df_train, df_valid = dataset.prepare( ["train", "valid"], col_set=["feature", "label"], data_key=DataHandlerLP.DK_L ) x_train, y_train = df_train["feature"], df_train["label"] x_valid, y_valid = df_valid["feature"], df_valid["label"] # get weights try: wdf_train, wdf_valid = dataset.prepare(["train", "valid"], col_set=["weight"], data_key=DataHandlerLP.DK_L) w_train, w_valid = wdf_train["weight"], wdf_valid["weight"] except KeyError as e: w_train = pd.DataFrame(np.ones_like(y_train.values), index=y_train.index) w_valid = pd.DataFrame(np.ones_like(y_valid.values), index=y_valid.index)
Parameters: dataset (Dataset) – dataset will generate the processed data from model training.
-
predict
(dataset: qlib.data.dataset.Dataset, segment: Union[str, slice] = 'test') → object¶ give prediction given Dataset
Parameters: - dataset (Dataset) – dataset will generate the processed dataset from model training.
- segment (Text or slice) – dataset will use this segment to prepare data. (default=test)
Returns: Return type: Prediction results with certain type such as pandas.Series.
-
-
class
qlib.model.base.
ModelFT
¶ Model (F)ine(t)unable
-
finetune
(dataset: qlib.data.dataset.Dataset)¶ finetune model based given dataset
A typical use case of finetuning model with qlib.workflow.R
# start exp to train init model with R.start(experiment_name="init models"): model.fit(dataset) R.save_objects(init_model=model) rid = R.get_recorder().id # Finetune model based on previous trained model with R.start(experiment_name="finetune model"): recorder = R.get_recorder(recorder_id=rid, experiment_name="init models") model = recorder.load_object("init_model") model.finetune(dataset, num_boost_round=10)
Parameters: dataset (Dataset) – dataset will generate the processed dataset from model training.
-
Strategy¶
-
class
qlib.contrib.strategy.strategy.
StrategyWrapper
(inner_strategy)¶ StrategyWrapper is a wrapper of another strategy. By overriding some methods to make some changes on the basic strategy Cost control and risk control will base on this class.
-
__init__
(inner_strategy)¶ Parameters: inner_strategy – set the inner strategy.
-
-
class
qlib.contrib.strategy.strategy.
AdjustTimer
¶ Responsible for timing of position adjusting
This is designed as multiple inheritance mechanism due to: - the is_adjust may need access to the internel state of a strategy.
- it can be reguard as a enhancement to the existing strategy.
-
is_adjust
(trade_date)¶ Return if the strategy can adjust positions on trade_date Will normally be used in strategy do trading with trade frequency
-
class
qlib.contrib.strategy.strategy.
ListAdjustTimer
(adjust_dates=None)¶ -
__init__
(adjust_dates=None)¶ Parameters: adjust_dates – an iterable object, it will return a timelist for trading dates
-
is_adjust
(trade_date)¶ Return if the strategy can adjust positions on trade_date Will normally be used in strategy do trading with trade frequency
-
-
class
qlib.contrib.strategy.strategy.
WeightStrategyBase
(order_generator_cls_or_obj=<class 'qlib.contrib.strategy.order_generator.OrderGenWInteract'>, *args, **kwargs)¶ -
__init__
(order_generator_cls_or_obj=<class 'qlib.contrib.strategy.order_generator.OrderGenWInteract'>, *args, **kwargs)¶ Initialize self. See help(type(self)) for accurate signature.
-
generate_target_weight_position
(score, current, trade_date)¶ Generate target position from score for this date and the current position.The cash is not considered in the position
Parameters: - score (pd.Series) – pred score for this trade date, index is stock_id, contain ‘score’ column.
- current (Position()) – current position.
- trade_date (pd.Timestamp) – trade date.
-
generate_order_list
(score_series, current, trade_exchange, pred_date, trade_date)¶ Parameters: - score_series (pd.Seires) – stock_id , score.
- current (Position()) – current of account.
- trade_exchange (Exchange()) – exchange.
- trade_date (pd.Timestamp) – date.
-
-
class
qlib.contrib.strategy.strategy.
TopkDropoutStrategy
(topk, n_drop, method_sell='bottom', method_buy='top', risk_degree=0.95, thresh=1, hold_thresh=1, only_tradable=False, **kwargs)¶ -
__init__
(topk, n_drop, method_sell='bottom', method_buy='top', risk_degree=0.95, thresh=1, hold_thresh=1, only_tradable=False, **kwargs)¶ Parameters: - topk (int) – the number of stocks in the portfolio.
- n_drop (int) – number of stocks to be replaced in each trading date.
- method_sell (str) – dropout method_sell, random/bottom.
- method_buy (str) – dropout method_buy, random/top.
- risk_degree (float) – position percentage of total value.
- thresh (int) – minimun holding days since last buy singal of the stock.
- hold_thresh (int) – minimum holding days before sell stock , will check current.get_stock_count(order.stock_id) >= self.thresh.
- only_tradable (bool) –
will the strategy only consider the tradable stock when buying and selling. if only_tradable:
the strategy will peek at the information in the short future to avoid untradable stocks (untradable stocks include stocks that meet suspension, or hit limit up or limit down).- else:
- the strategy will generate orders without peeking any information in the future, so the order generated by the strategies may fail.
-
get_risk_degree
(date)¶ Return the proportion of your total value you will used in investment. Dynamically risk_degree will result in Market timing.
-
generate_order_list
(score_series, current, trade_exchange, pred_date, trade_date)¶ Generate order list according to score_series at trade_date, will not change current.
Parameters: - score_series (pd.Series) – stock_id , score.
- current (Position()) – current of account.
- trade_exchange (Exchange()) – exchange.
- pred_date (pd.Timestamp) – predict date.
- trade_date (pd.Timestamp) – trade date.
-
Evaluate¶
-
qlib.contrib.evaluate.
risk_analysis
(r, N=252)¶ Risk Analysis
Parameters: - r (pandas.Series) – daily return series.
- N (int) – scaler for annualizing information_ratio (day: 250, week: 50, month: 12).
-
qlib.contrib.evaluate.
backtest
(pred, account=1000000000.0, shift=1, benchmark='SH000905', verbose=True, **kwargs)¶ This function will help you set a reasonable Exchange and provide default value for strategy :param - backtest workflow related or commmon arguments: :param pred: predict should has <datetime, instrument> index and one score column. :type pred: pandas.DataFrame :param account: init account value. :type account: float :param shift: whether to shift prediction by one day. :type shift: int :param benchmark: benchmark code, default is SH000905 CSI 500. :type benchmark: str :param verbose: whether to print log. :type verbose: bool :param - strategy related arguments: :param strategy: strategy used in backtest. :type strategy: Strategy() :param topk: top-N stocks to buy. :type topk: int (Default value: 50) :param margin:
if isinstance(margin, int):
sell_limit = margin
else:
sell_limit = pred_in_a_day.count() * margin
buffer margin, in single score_mode, continue holding stock if it is in nlargest(sell_limit). sell_limit should be no less than topk.
Parameters: - n_drop (int) – number of stocks to be replaced in each trading date.
- risk_degree (float) – 0-1, 0.95 for example, use 95% money to trade.
- str_type ('amount', 'weight' or 'dropout') – strategy type: TopkAmountStrategy ,TopkWeightStrategy or TopkDropoutStrategy.
- exchange related arguments (-) –
- exchange (Exchange()) – pass the exchange for speeding up.
- subscribe_fields (list) – subscribe fields.
- open_cost (float) – open transaction cost. The default value is 0.002(0.2%).
- close_cost (float) – close transaction cost. The default value is 0.002(0.2%).
- min_cost (float) – min transaction cost.
- trade_unit (int) – 100 for China A.
- deal_price (str) – dealing price type: ‘close’, ‘open’, ‘vwap’.
- limit_threshold (float) – limit move 0.1 (10%) for example, long and short with same limit.
- extract_codes (bool) –
will we pass the codes extracted from the pred to the exchange.
Note
This will be faster with offline qlib.
- executor related arguments (-) –
- executor (BaseExecutor()) – executor used in backtest.
- verbose (bool) – whether to print log.
-
qlib.contrib.evaluate.
long_short_backtest
(pred, topk=50, deal_price=None, shift=1, open_cost=0, close_cost=0, trade_unit=None, limit_threshold=None, min_cost=5, subscribe_fields=[], extract_codes=False)¶ A backtest for long-short strategy
Parameters: - pred – The trading signal produced on day T.
- topk – The short topk securities and long topk securities.
- deal_price – The price to deal the trading.
- shift – Whether to shift prediction by one day. The trading day will be T+1 if shift==1.
- open_cost – open transaction cost.
- close_cost – close transaction cost.
- trade_unit – 100 for China A.
- limit_threshold – limit move 0.1 (10%) for example, long and short with same limit.
- min_cost – min transaction cost.
- subscribe_fields – subscribe fields.
- extract_codes – bool. will we pass the codes extracted from the pred to the exchange. NOTE: This will be faster with offline qlib.
Returns: The result of backtest, it is represented by a dict. { “long”: long_returns(excess),
”short”: short_returns(excess), “long_short”: long_short_returns}
Report¶
-
qlib.contrib.report.analysis_position.report.
report_graph
(report_df: pandas.core.frame.DataFrame, show_notebook: bool = True) → [<class 'list'>, <class 'tuple'>]¶ display backtest report
Example:
from qlib.contrib.evaluate import backtest from qlib.contrib.strategy import TopkDropoutStrategy # backtest parameters bparas = {} bparas['limit_threshold'] = 0.095 bparas['account'] = 1000000000 sparas = {} sparas['topk'] = 50 sparas['n_drop'] = 230 strategy = TopkDropoutStrategy(**sparas) report_normal_df, _ = backtest(pred_df, strategy, **bparas) qcr.analysis_position.report_graph(report_normal_df)
Parameters: - report_df –
df.index.name must be date, df.columns must contain return, turnover, cost, bench.
return cost bench turnover date 2017-01-04 0.003421 0.000864 0.011693 0.576325 2017-01-05 0.000508 0.000447 0.000721 0.227882 2017-01-06 -0.003321 0.000212 -0.004322 0.102765 2017-01-09 0.006753 0.000212 0.006874 0.105864 2017-01-10 -0.000416 0.000440 -0.003350 0.208396
- show_notebook – whether to display graphics in notebook, the default is True.
Returns: if show_notebook is True, display in notebook; else return plotly.graph_objs.Figure list.
- report_df –
-
qlib.contrib.report.analysis_position.score_ic.
score_ic_graph
(pred_label: pandas.core.frame.DataFrame, show_notebook: bool = True) → [<class 'list'>, <class 'tuple'>]¶ score IC
Example:
from qlib.data import D from qlib.contrib.report import analysis_position pred_df_dates = pred_df.index.get_level_values(level='datetime') features_df = D.features(D.instruments('csi500'), ['Ref($close, -2)/Ref($close, -1)-1'], pred_df_dates.min(), pred_df_dates.max()) features_df.columns = ['label'] pred_label = pd.concat([features_df, pred], axis=1, sort=True).reindex(features_df.index) analysis_position.score_ic_graph(pred_label)
Parameters: - pred_label –
index is pd.MultiIndex, index name is [instrument, datetime]; columns names is [score, label].
instrument datetime score label SH600004 2017-12-11 -0.013502 -0.013502 2017-12-12 -0.072367 -0.072367 2017-12-13 -0.068605 -0.068605 2017-12-14 0.012440 0.012440 2017-12-15 -0.102778 -0.102778
- show_notebook – whether to display graphics in notebook, the default is True.
Returns: if show_notebook is True, display in notebook; else return plotly.graph_objs.Figure list.
- pred_label –
-
qlib.contrib.report.analysis_position.cumulative_return.
cumulative_return_graph
(position: dict, report_normal: pandas.core.frame.DataFrame, label_data: pandas.core.frame.DataFrame, show_notebook=True, start_date=None, end_date=None) → Iterable[plotly.graph_objs._figure.Figure]¶ Backtest buy, sell, and holding cumulative return graph
Example:
from qlib.data import D from qlib.contrib.evaluate import risk_analysis, backtest, long_short_backtest from qlib.contrib.strategy import TopkDropoutStrategy # backtest parameters bparas = {} bparas['limit_threshold'] = 0.095 bparas['account'] = 1000000000 sparas = {} sparas['topk'] = 50 sparas['n_drop'] = 5 strategy = TopkDropoutStrategy(**sparas) report_normal_df, positions = backtest(pred_df, strategy, **bparas) pred_df_dates = pred_df.index.get_level_values(level='datetime') features_df = D.features(D.instruments('csi500'), ['Ref($close, -1)/$close - 1'], pred_df_dates.min(), pred_df_dates.max()) features_df.columns = ['label'] qcr.analysis_position.cumulative_return_graph(positions, report_normal_df, features_df)
- Graph desc:
- Axis X: Trading day.
- Axis Y:
- Above axis Y: (((Ref($close, -1)/$close - 1) * weight).sum() / weight.sum()).cumsum().
- Below axis Y: Daily weight sum.
- In the sell graph, y < 0 stands for profit; in other cases, y > 0 stands for profit.
- In the buy_minus_sell graph, the y value of the weight graph at the bottom is buy_weight + sell_weight.
- In each graph, the red line in the histogram on the right represents the average.
Parameters: - position – position data
- report_normal –
return cost bench turnover date 2017-01-04 0.003421 0.000864 0.011693 0.576325 2017-01-05 0.000508 0.000447 0.000721 0.227882 2017-01-06 -0.003321 0.000212 -0.004322 0.102765 2017-01-09 0.006753 0.000212 0.006874 0.105864 2017-01-10 -0.000416 0.000440 -0.003350 0.208396
- label_data – D.features result; index is pd.MultiIndex, index name is [instrument, datetime]; columns names is [label].
The label T is the change from T to T+1, it is recommended to use
close
, example: D.features(D.instruments(‘csi500’), [‘Ref($close, -1)/$close-1’])label instrument datetime SH600004 2017-12-11 -0.013502 2017-12-12 -0.072367 2017-12-13 -0.068605 2017-12-14 0.012440 2017-12-15 -0.102778
Parameters: - show_notebook – True or False. If True, show graph in notebook, else return figures
- start_date – start date
- end_date – end date
Returns:
-
qlib.contrib.report.analysis_position.risk_analysis.
risk_analysis_graph
(analysis_df: pandas.core.frame.DataFrame = None, report_normal_df: pandas.core.frame.DataFrame = None, report_long_short_df: pandas.core.frame.DataFrame = None, show_notebook: bool = True) → Iterable[plotly.graph_objs._figure.Figure]¶ Generate analysis graph and monthly analysis
Example:
from qlib.contrib.evaluate import risk_analysis, backtest, long_short_backtest from qlib.contrib.strategy import TopkDropoutStrategy from qlib.contrib.report import analysis_position # backtest parameters bparas = {} bparas['limit_threshold'] = 0.095 bparas['account'] = 1000000000 sparas = {} sparas['topk'] = 50 sparas['n_drop'] = 230 strategy = TopkDropoutStrategy(**sparas) report_normal_df, positions = backtest(pred_df, strategy, **bparas) # long_short_map = long_short_backtest(pred_df) # report_long_short_df = pd.DataFrame(long_short_map) analysis = dict() # analysis['pred_long'] = risk_analysis(report_long_short_df['long']) # analysis['pred_short'] = risk_analysis(report_long_short_df['short']) # analysis['pred_long_short'] = risk_analysis(report_long_short_df['long_short']) analysis['excess_return_without_cost'] = risk_analysis(report_normal_df['return'] - report_normal_df['bench']) analysis['excess_return_with_cost'] = risk_analysis(report_normal_df['return'] - report_normal_df['bench'] - report_normal_df['cost']) analysis_df = pd.concat(analysis) analysis_position.risk_analysis_graph(analysis_df, report_normal_df)
Parameters: - analysis_df –
analysis data, index is pd.MultiIndex; columns names is [risk].
risk excess_return_without_cost mean 0.000692 std 0.005374 annualized_return 0.174495 information_ratio 2.045576 max_drawdown -0.079103 excess_return_with_cost mean 0.000499 std 0.005372 annualized_return 0.125625 information_ratio 1.473152 max_drawdown -0.088263
- report_normal_df –
df.index.name must be date, df.columns must contain return, turnover, cost, bench.
return cost bench turnover date 2017-01-04 0.003421 0.000864 0.011693 0.576325 2017-01-05 0.000508 0.000447 0.000721 0.227882 2017-01-06 -0.003321 0.000212 -0.004322 0.102765 2017-01-09 0.006753 0.000212 0.006874 0.105864 2017-01-10 -0.000416 0.000440 -0.003350 0.208396
- report_long_short_df –
df.index.name must be date, df.columns contain long, short, long_short.
long short long_short date 2017-01-04 -0.001360 0.001394 0.000034 2017-01-05 0.002456 0.000058 0.002514 2017-01-06 0.000120 0.002739 0.002859 2017-01-09 0.001436 0.001838 0.003273 2017-01-10 0.000824 -0.001944 -0.001120
- show_notebook – Whether to display graphics in a notebook, default True. If True, show graph in notebook If False, return graph figure
Returns: - analysis_df –
-
qlib.contrib.report.analysis_position.rank_label.
rank_label_graph
(position: dict, label_data: pandas.core.frame.DataFrame, start_date=None, end_date=None, show_notebook=True) → Iterable[plotly.graph_objs._figure.Figure]¶ Ranking percentage of stocks buy, sell, and holding on the trading day. Average rank-ratio(similar to sell_df[‘label’].rank(ascending=False) / len(sell_df)) of daily trading
Example:
from qlib.data import D from qlib.contrib.evaluate import backtest from qlib.contrib.strategy import TopkDropoutStrategy # backtest parameters bparas = {} bparas['limit_threshold'] = 0.095 bparas['account'] = 1000000000 sparas = {} sparas['topk'] = 50 sparas['n_drop'] = 230 strategy = TopkDropoutStrategy(**sparas) _, positions = backtest(pred_df, strategy, **bparas) pred_df_dates = pred_df.index.get_level_values(level='datetime') features_df = D.features(D.instruments('csi500'), ['Ref($close, -1)/$close-1'], pred_df_dates.min(), pred_df_dates.max()) features_df.columns = ['label'] qcr.analysis_position.rank_label_graph(positions, features_df, pred_df_dates.min(), pred_df_dates.max())
Parameters: - position – position data; qlib.contrib.backtest.backtest.backtest result.
- label_data – D.features result; index is pd.MultiIndex, index name is [instrument, datetime]; columns names is [label].
The label T is the change from T to T+1, it is recommended to use
close
, example: D.features(D.instruments(‘csi500’), [‘Ref($close, -1)/$close-1’]).label instrument datetime SH600004 2017-12-11 -0.013502 2017-12-12 -0.072367 2017-12-13 -0.068605 2017-12-14 0.012440 2017-12-15 -0.102778
Parameters: - start_date – start date
- end_date – end_date
- show_notebook – True or False. If True, show graph in notebook, else return figures.
Returns:
-
qlib.contrib.report.analysis_model.analysis_model_performance.
ic_figure
(ic_df: pandas.core.frame.DataFrame, show_nature_day=True, **kwargs) → plotly.graph_objs._figure.Figure¶ IC figure
Parameters: - ic_df – ic DataFrame
- show_nature_day – whether to display the abscissa of non-trading day
Returns: plotly.graph_objs.Figure
-
qlib.contrib.report.analysis_model.analysis_model_performance.
model_performance_graph
(pred_label: pandas.core.frame.DataFrame, lag: int = 1, N: int = 5, reverse=False, rank=False, graph_names: list = ['group_return', 'pred_ic', 'pred_autocorr'], show_notebook: bool = True, show_nature_day=True) → [<class 'list'>, <class 'tuple'>]¶ Model performance
Parameters: pred_label – index is pd.MultiIndex, index name is [instrument, datetime]; columns names is **[score, label]**. It is usually same as the label of model training(e.g. “Ref($close, -2)/Ref($close, -1) - 1”).
instrument datetime score label SH600004 2017-12-11 -0.013502 -0.013502 2017-12-12 -0.072367 -0.072367 2017-12-13 -0.068605 -0.068605 2017-12-14 0.012440 0.012440 2017-12-15 -0.102778 -0.102778
Parameters: - lag – pred.groupby(level=’instrument’)[‘score’].shift(lag). It will be only used in the auto-correlation computing.
- N – group number, default 5.
- reverse – if True, pred[‘score’] *= -1.
- rank – if True, calculate rank ic.
- graph_names – graph names; default [‘cumulative_return’, ‘pred_ic’, ‘pred_autocorr’, ‘pred_turnover’].
- show_notebook – whether to display graphics in notebook, the default is True.
- show_nature_day – whether to display the abscissa of non-trading day.
Returns: if show_notebook is True, display in notebook; else return plotly.graph_objs.Figure list.
Workflow¶
Experiment Manager¶
-
class
qlib.workflow.expm.
ExpManager
(uri: str, default_exp_name: Optional[str])¶ This is the ExpManager class for managing experiments. The API is designed similar to mlflow. (The link: https://mlflow.org/docs/latest/python_api/mlflow.html)
-
__init__
(uri: str, default_exp_name: Optional[str])¶ Initialize self. See help(type(self)) for accurate signature.
-
start_exp
(*, experiment_id: Optional[str] = None, experiment_name: Optional[str] = None, recorder_id: Optional[str] = None, recorder_name: Optional[str] = None, uri: Optional[str] = None, resume: bool = False, **kwargs)¶ Start an experiment. This method includes first get_or_create an experiment, and then set it to be active.
Parameters: - experiment_id (str) – id of the active experiment.
- experiment_name (str) – name of the active experiment.
- recorder_id (str) – id of the recorder to be started.
- recorder_name (str) – name of the recorder to be started.
- uri (str) – the current tracking URI.
- resume (boolean) – whether to resume the experiment and recorder.
Returns: Return type: An active experiment.
-
end_exp
(recorder_status: str = 'SCHEDULED', **kwargs)¶ End an active experiment.
Parameters: - experiment_name (str) – name of the active experiment.
- recorder_status (str) – the status of the active recorder of the experiment.
-
create_exp
(experiment_name: Optional[str] = None)¶ Create an experiment.
Parameters: experiment_name (str) – the experiment name, which must be unique. Returns: Return type: An experiment object.
-
search_records
(experiment_ids=None, **kwargs)¶ Get a pandas DataFrame of records that fit the search criteria of the experiment. Inputs are the search critera user want to apply.
Returns: - A pandas.DataFrame of records, where each metric, parameter, and tag
- are expanded into their own columns named metrics., params.*, and tags.**
- respectively. For records that don’t have a particular metric, parameter, or tag, their
- value will be (NumPy) Nan, None, or None respectively.
-
get_exp
(*, experiment_id=None, experiment_name=None, create: bool = True, start: bool = False)¶ Retrieve an experiment. This method includes getting an active experiment, and get_or_create a specific experiment.
When user specify experiment id and name, the method will try to return the specific experiment. When user does not provide recorder id or name, the method will try to return the current active experiment. The create argument determines whether the method will automatically create a new experiment according to user’s specification if the experiment hasn’t been created before.
If create is True:
If active experiment exists:
- no id or name specified, return the active experiment.
- if id or name is specified, return the specified experiment. If no such exp found, create a new experiment with given id or name. If start is set to be True, the experiment is set to be active.
If active experiment not exists:
- no id or name specified, create a default experiment.
- if id or name is specified, return the specified experiment. If no such exp found, create a new experiment with given id or name. If start is set to be True, the experiment is set to be active.
Else If create is False:
If active experiment exists:
- no id or name specified, return the active experiment.
- if id or name is specified, return the specified experiment. If no such exp found, raise Error.
If active experiment not exists:
- no id or name specified. If the default experiment exists, return it, otherwise, raise Error.
- if id or name is specified, return the specified experiment. If no such exp found, raise Error.
Parameters: - experiment_id (str) – id of the experiment to return.
- experiment_name (str) – name of the experiment to return.
- create (boolean) – create the experiment it if hasn’t been created before.
- start (boolean) – start the new experiment if one is created.
Returns: Return type: An experiment object.
-
delete_exp
(experiment_id=None, experiment_name=None)¶ Delete an experiment.
Parameters: - experiment_id (str) – the experiment id.
- experiment_name (str) – the experiment name.
-
default_uri
¶ Get the default tracking URI from qlib.config.C
-
uri
¶ Get the default tracking URI or current URI.
Returns: Return type: The tracking URI string.
-
set_uri
(uri: Optional[str] = None)¶ Set the current tracking URI and the corresponding variables.
Parameters: uri (str) –
-
list_experiments
()¶ List all the existing experiments.
Returns: Return type: A dictionary (name -> experiment) of experiments information that being stored.
-
Experiment¶
-
class
qlib.workflow.exp.
Experiment
(id, name)¶ This is the Experiment class for each experiment being run. The API is designed similar to mlflow. (The link: https://mlflow.org/docs/latest/python_api/mlflow.html)
-
__init__
(id, name)¶ Initialize self. See help(type(self)) for accurate signature.
-
start
(*, recorder_id=None, recorder_name=None, resume=False)¶ Start the experiment and set it to be active. This method will also start a new recorder.
Parameters: - recorder_id (str) – the id of the recorder to be created.
- recorder_name (str) – the name of the recorder to be created.
- resume (bool) – whether to resume the first recorder
Returns: Return type: An active recorder.
-
end
(recorder_status='SCHEDULED')¶ End the experiment.
Parameters: recorder_status (str) – the status the recorder to be set with when ending (SCHEDULED, RUNNING, FINISHED, FAILED).
-
create_recorder
(recorder_name=None)¶ Create a recorder for each experiment.
Parameters: recorder_name (str) – the name of the recorder to be created. Returns: Return type: A recorder object.
-
search_records
(**kwargs)¶ Get a pandas DataFrame of records that fit the search criteria of the experiment. Inputs are the search critera user want to apply.
Returns: - A pandas.DataFrame of records, where each metric, parameter, and tag
- are expanded into their own columns named metrics., params.*, and tags.**
- respectively. For records that don’t have a particular metric, parameter, or tag, their
- value will be (NumPy) Nan, None, or None respectively.
-
delete_recorder
(recorder_id)¶ Create a recorder for each experiment.
Parameters: recorder_id (str) – the id of the recorder to be deleted.
-
get_recorder
(recorder_id=None, recorder_name=None, create: bool = True, start: bool = False)¶ Retrieve a Recorder for user. When user specify recorder id and name, the method will try to return the specific recorder. When user does not provide recorder id or name, the method will try to return the current active recorder. The create argument determines whether the method will automatically create a new recorder according to user’s specification if the recorder hasn’t been created before.
If create is True:
If active recorder exists:
- no id or name specified, return the active recorder.
- if id or name is specified, return the specified recorder. If no such exp found, create a new recorder with given id or name. If start is set to be True, the recorder is set to be active.
If active recorder not exists:
- no id or name specified, create a new recorder.
- if id or name is specified, return the specified experiment. If no such exp found, create a new recorder with given id or name. If start is set to be True, the recorder is set to be active.
Else If create is False:
If active recorder exists:
- no id or name specified, return the active recorder.
- if id or name is specified, return the specified recorder. If no such exp found, raise Error.
If active recorder not exists:
- no id or name specified, raise Error.
- if id or name is specified, return the specified recorder. If no such exp found, raise Error.
Parameters: - recorder_id (str) – the id of the recorder to be deleted.
- recorder_name (str) – the name of the recorder to be deleted.
- create (boolean) – create the recorder if it hasn’t been created before.
- start (boolean) – start the new recorder if one is created.
Returns: Return type: A recorder object.
-
list_recorders
(**flt_kwargs)¶ List all the existing recorders of this experiment. Please first get the experiment instance before calling this method. If user want to use the method R.list_recorders(), please refer to the related API document in QlibRecorder.
- flt_kwargs : dict
- filter recorders by conditions e.g. list_recorders(status=Recorder.STATUS_FI)
Returns: Return type: A dictionary (id -> recorder) of recorder information that being stored.
-
Recorder¶
-
class
qlib.workflow.recorder.
Recorder
(experiment_id, name)¶ This is the Recorder class for logging the experiments. The API is designed similar to mlflow. (The link: https://mlflow.org/docs/latest/python_api/mlflow.html)
The status of the recorder can be SCHEDULED, RUNNING, FINISHED, FAILED.
-
__init__
(experiment_id, name)¶ Initialize self. See help(type(self)) for accurate signature.
-
save_objects
(local_path=None, artifact_path=None, **kwargs)¶ Save objects such as prediction file or model checkpoints to the artifact URI. User can save object through keywords arguments (name:value).
Parameters: - local_path (str) – if provided, them save the file or directory to the artifact URI.
- artifact_path=None (str) – the relative path for the artifact to be stored in the URI.
-
load_object
(name)¶ Load objects such as prediction file or model checkpoints.
Parameters: name (str) – name of the file to be loaded. Returns: Return type: The saved object.
-
start_run
()¶ Start running or resuming the Recorder. The return value can be used as a context manager within a with block; otherwise, you must call end_run() to terminate the current run. (See ActiveRun class in mlflow)
Returns: Return type: An active running object (e.g. mlflow.ActiveRun object)
-
end_run
()¶ End an active Recorder.
-
log_params
(**kwargs)¶ Log a batch of params for the current run.
Parameters: arguments (keyword) – key, value pair to be logged as parameters.
-
log_metrics
(step=None, **kwargs)¶ Log multiple metrics for the current run.
Parameters: arguments (keyword) – key, value pair to be logged as metrics.
Log a batch of tags for the current run.
Parameters: arguments (keyword) – key, value pair to be logged as tags.
Delete some tags from a run.
Parameters: keys (series of strs of the keys) – all the name of the tag to be deleted.
-
list_artifacts
(artifact_path: str = None)¶ List all the artifacts of a recorder.
Parameters: artifact_path (str) – the relative path for the artifact to be stored in the URI. Returns: Return type: A list of artifacts information (name, path, etc.) that being stored.
-
list_metrics
()¶ List all the metrics of a recorder.
Returns: Return type: A dictionary of metrics that being stored.
-
list_params
()¶ List all the params of a recorder.
Returns: Return type: A dictionary of params that being stored.
List all the tags of a recorder.
Returns: Return type: A dictionary of tags that being stored.
-
Record Template¶
-
class
qlib.workflow.record_temp.
RecordTemp
(recorder)¶ This is the Records Template class that enables user to generate experiment results such as IC and backtest in a certain format.
-
__init__
(recorder)¶ Initialize self. See help(type(self)) for accurate signature.
-
generate
(**kwargs)¶ Generate certain records such as IC, backtest etc., and save them.
Parameters: kwargs –
-
load
(name)¶ Load the stored records. Due to the fact that some problems occured when we tried to balancing a clean API with the Python’s inheritance. This method has to be used in a rather ugly way, and we will try to fix them in the future:
sar = SigAnaRecord(recorder) ic = sar.load(sar.get_path("ic.pkl"))
Parameters: name (str) – the name for the file to be load. Returns: Return type: The stored records.
-
list
()¶ List the stored records.
Returns: Return type: A list of all the stored records.
-
check
(parent=False)¶ Check if the records is properly generated and saved.
FileExistsError: whether the records are stored properly.
-
-
class
qlib.workflow.record_temp.
SignalRecord
(model=None, dataset=None, recorder=None)¶ This is the Signal Record class that generates the signal prediction. This class inherits the
RecordTemp
class.-
__init__
(model=None, dataset=None, recorder=None)¶ Initialize self. See help(type(self)) for accurate signature.
-
generate
(**kwargs)¶ Generate certain records such as IC, backtest etc., and save them.
Parameters: kwargs –
-
list
()¶ List the stored records.
Returns: Return type: A list of all the stored records.
-
load
(name='pred.pkl')¶ Load the stored records. Due to the fact that some problems occured when we tried to balancing a clean API with the Python’s inheritance. This method has to be used in a rather ugly way, and we will try to fix them in the future:
sar = SigAnaRecord(recorder) ic = sar.load(sar.get_path("ic.pkl"))
Parameters: name (str) – the name for the file to be load. Returns: Return type: The stored records.
-
-
class
qlib.workflow.record_temp.
HFSignalRecord
(recorder, **kwargs)¶ This is the Signal Analysis Record class that generates the analysis results such as IC and IR. This class inherits the
RecordTemp
class.-
__init__
(recorder, **kwargs)¶ Initialize self. See help(type(self)) for accurate signature.
-
generate
()¶ Generate certain records such as IC, backtest etc., and save them.
Parameters: kwargs –
-
list
()¶ List the stored records.
Returns: Return type: A list of all the stored records.
-
-
class
qlib.workflow.record_temp.
SigAnaRecord
(recorder, ana_long_short=False, ann_scaler=252, label_col=0, **kwargs)¶ This is the Signal Analysis Record class that generates the analysis results such as IC and IR. This class inherits the
RecordTemp
class.-
__init__
(recorder, ana_long_short=False, ann_scaler=252, label_col=0, **kwargs)¶ Initialize self. See help(type(self)) for accurate signature.
-
generate
(**kwargs)¶ Generate certain records such as IC, backtest etc., and save them.
Parameters: kwargs –
-
list
()¶ List the stored records.
Returns: Return type: A list of all the stored records.
-
-
class
qlib.workflow.record_temp.
PortAnaRecord
(recorder, config, **kwargs)¶ This is the Portfolio Analysis Record class that generates the analysis results such as those of backtest. This class inherits the
RecordTemp
class.The following files will be stored in recorder - report_normal.pkl & positions_normal.pkl:
- The return report and detailed positions of the backtest, returned by qlib/contrib/evaluate.py:backtest
- port_analysis.pkl : The risk analysis of your portfolio, returned by qlib/contrib/evaluate.py:risk_analysis
-
__init__
(recorder, config, **kwargs)¶ - config[“strategy”] : dict
- define the strategy class as well as the kwargs.
- config[“backtest”] : dict
- define the backtest kwargs.
-
generate
(**kwargs)¶ Generate certain records such as IC, backtest etc., and save them.
Parameters: kwargs –
-
list
()¶ List the stored records.
Returns: Return type: A list of all the stored records.
Task Management¶
TaskGen¶
TaskGenerator module can generate many tasks based on TaskGen and some task templates.
-
qlib.workflow.task.gen.
task_generator
(tasks, generators) → list¶ Use a list of TaskGen and a list of task templates to generate different tasks.
For examples:
There are 3 task templates a,b,c and 2 TaskGen A,B. A will generates 2 tasks from a template and B will generates 3 tasks from a template. task_generator([a, b, c], [A, B]) will finally generate 3*2*3 = 18 tasks.Parameters: Returns: a list of tasks
Return type: list
-
class
qlib.workflow.task.gen.
TaskGen
¶ The base class for generating different tasks
Example 1:
input: a specific task template and rolling steps
output: rolling version of the tasks
Example 2:
input: a specific task template and losses list
output: a set of tasks with different losses
-
generate
(task: dict) → List[dict]¶ Generate different tasks based on a task template
Parameters: task (dict) – a task template Returns: A list of tasks Return type: typing.List[dict]
-
-
qlib.workflow.task.gen.
handler_mod
(task: dict, rolling_gen)¶ Help to modify the handler end time when using RollingGen
Parameters: - task (dict) – a task template
- rg (RollingGen) – an instance of RollingGen
-
class
qlib.workflow.task.gen.
RollingGen
(step: int = 40, rtype: str = 'expanding', ds_extra_mod_func: Union[None, Callable] = <function handler_mod>)¶ -
__init__
(step: int = 40, rtype: str = 'expanding', ds_extra_mod_func: Union[None, Callable] = <function handler_mod>)¶ Generate tasks for rolling
Parameters: - step (int) – step to rolling
- rtype (str) – rolling type (expanding, sliding)
- ds_extra_mod_func (Callable) – A method like: handler_mod(task: dict, rg: RollingGen)
Do some extra action after generating a task. For example, use
handler_mod
to modify the end time of the handler of a dataset.
-
gen_following_tasks
(task: dict, test_end: pandas._libs.tslibs.timestamps.Timestamp) → List[dict]¶ generating following rolling tasks for task until test_end
Parameters: - task (dict) – Qlib task format
- test_end (pd.Timestamp) – the latest rolling task includes test_end
Returns: the following tasks of task`(`task itself is excluded)
Return type: List[dict]
-
generate
(task: dict) → List[dict]¶ Converting the task into a rolling task.
Parameters: task (dict) – A dict describing a task. For example.
DEFAULT_TASK = { "model": { "class": "LGBModel", "module_path": "qlib.contrib.model.gbdt", }, "dataset": { "class": "DatasetH", "module_path": "qlib.data.dataset", "kwargs": { "handler": { "class": "Alpha158", "module_path": "qlib.contrib.data.handler", "kwargs": { "start_time": "2008-01-01", "end_time": "2020-08-01", "fit_start_time": "2008-01-01", "fit_end_time": "2014-12-31", "instruments": "csi100", }, }, "segments": { "train": ("2008-01-01", "2014-12-31"), "valid": ("2015-01-01", "2016-12-20"), # Please avoid leaking the future test data into validation "test": ("2017-01-01", "2020-08-01"), }, }, }, "record": [ { "class": "SignalRecord", "module_path": "qlib.workflow.record_temp", }, ] }
Returns: List[dict] Return type: a list of tasks
-
TaskManager¶
TaskManager can fetch unused tasks automatically and manage the lifecycle of a set of tasks with error handling. These features can run tasks concurrently and ensure every task will be used only once. Task Manager will store all tasks in MongoDB. Users MUST finished the configuration of MongoDB when using this module.
A task in TaskManager consists of 3 parts - tasks description: the desc will define the task - tasks status: the status of the task - tasks result: A user can get the task with the task description and task result.
-
class
qlib.workflow.task.manage.
TaskManager
(task_pool: str)¶ Here is what will a task looks like when it created by TaskManager
{ 'def': pickle serialized task definition. using pickle will make it easier 'filter': json-like data. This is for filtering the tasks. 'status': 'waiting' | 'running' | 'done' 'res': pickle serialized task result, }
The tasks manager assumes that you will only update the tasks you fetched. The mongo fetch one and update will make it date updating secure.
This class can be used as a tool from commandline. Here are serveral examples
python -m qlib.workflow.task.manage -t <pool_name> wait python -m qlib.workflow.task.manage -t <pool_name> task_stat
Note
Assumption: the data in MongoDB was encoded and the data out of MongoDB was decoded
Here are four status which are:
STATUS_WAITING: waiting for training
STATUS_RUNNING: training
STATUS_PART_DONE: finished some step and waiting for next step
STATUS_DONE: all work done
-
__init__
(task_pool: str)¶ Init Task Manager, remember to make the statement of MongoDB url and database name firstly. A TaskManager instance serves a specific task pool. The static method of this module serves the whole MongoDB.
Parameters: task_pool (str) – the name of Collection in MongoDB
-
static
list
() → list¶ List the all collection(task_pool) of the db.
Returns: list
-
replace_task
(task, new_task)¶ Use a new task to replace a old one
Parameters: - task – old task
- new_task – new task
-
insert_task
(task)¶ Insert a task.
Parameters: task – the task waiting for insert Returns: pymongo.results.InsertOneResult
-
insert_task_def
(task_def)¶ Insert a task to task_pool
Parameters: task_def (dict) – the task definition Returns: Return type: pymongo.results.InsertOneResult
-
create_task
(task_def_l, dry_run=False, print_nt=False) → List[str]¶ If the tasks in task_def_l are new, then insert new tasks into the task_pool, and record inserted_id. If a task is not new, then just query its _id.
Parameters: - task_def_l (list) – a list of task
- dry_run (bool) – if insert those new tasks to task pool
- print_nt (bool) – if print new task
Returns: a list of the _id of task_def_l
Return type: List[str]
-
fetch_task
(query={}, status='waiting') → dict¶ Use query to fetch tasks.
Parameters: - query (dict, optional) – query dict. Defaults to {}.
- status (str, optional) – [description]. Defaults to STATUS_WAITING.
Returns: a task(document in collection) after decoding
Return type: dict
-
safe_fetch_task
(query={}, status='waiting')¶ Fetch task from task_pool using query with contextmanager
Parameters: query (dict) – the dict of query Returns: dict Return type: a task(document in collection) after decoding
-
query
(query={}, decode=True)¶ Query task in collection. This function may raise exception pymongo.errors.CursorNotFound: cursor id not found if it takes too long to iterate the generator
Parameters: - query (dict) – the dict of query
- decode (bool) –
Returns: dict
Return type: a task(document in collection) after decoding
-
re_query
(_id) → dict¶ Use _id to query task.
Parameters: _id (str) – _id of a document Returns: a task(document in collection) after decoding Return type: dict
-
commit_task_res
(task, res, status='done')¶ Commit the result to task[‘res’].
Parameters: - task ([type]) – [description]
- res (object) – the result you want to save
- status (str, optional) – STATUS_WAITING, STATUS_RUNNING, STATUS_DONE, STATUS_PART_DONE. Defaults to STATUS_DONE.
-
return_task
(task, status='waiting')¶ Return a task to status. Alway using in error handling.
Parameters: - task ([type]) – [description]
- status (str, optional) – STATUS_WAITING, STATUS_RUNNING, STATUS_DONE, STATUS_PART_DONE. Defaults to STATUS_WAITING.
-
remove
(query={})¶ Remove the task using query
Parameters: query (dict) – the dict of query
-
task_stat
(query={}) → dict¶ Count the tasks in every status.
Parameters: query (dict, optional) – the query dict. Defaults to {}. Returns: dict
-
reset_waiting
(query={})¶ Reset all running task into waiting status. Can be used when some running task exit unexpected.
Parameters: query (dict, optional) – the query dict. Defaults to {}.
-
prioritize
(task, priority: int)¶ Set priority for task
Parameters: - task (dict) – The task query from the database
- priority (int) – the target priority
-
wait
(query={})¶ When multiprocessing, the main progress may fetch nothing from TaskManager because there are still some running tasks. So main progress should wait until all tasks are trained well by other progress or machines.
Parameters: query (dict, optional) – the query dict. Defaults to {}.
-
-
qlib.workflow.task.manage.
run_task
(task_func: Callable, task_pool: str, query: dict = {}, force_release: bool = False, before_status: str = 'waiting', after_status: str = 'done', **kwargs)¶ While the task pool is not empty (has WAITING tasks), use task_func to fetch and run tasks in task_pool
After running this method, here are 4 situations (before_status -> after_status):
STATUS_WAITING -> STATUS_DONE: use task[“def”] as task_func param, it means that the task has not been started
STATUS_WAITING -> STATUS_PART_DONE: use task[“def”] as task_func param
STATUS_PART_DONE -> STATUS_PART_DONE: use task[“res”] as task_func param, it means that the task has been started but not completed
STATUS_PART_DONE -> STATUS_DONE: use task[“res”] as task_func param
Parameters: - task_func (Callable) –
- def (task_def, **kwargs) -> <res which will be committed>
- the function to run the task
- task_pool (str) – the name of the task pool (Collection in MongoDB)
- query (dict) – will use this dict to query task_pool when fetching task
- force_release (bool) – will the program force to release the resource
- before_status (str:) – the tasks in before_status will be fetched and trained. Can be STATUS_WAITING, STATUS_PART_DONE.
- after_status (str:) – the tasks after trained will become after_status. Can be STATUS_WAITING, STATUS_PART_DONE.
- kwargs – the params for task_func
- task_func (Callable) –
Trainer¶
The Trainer will train a list of tasks and return a list of model recorders. There are two steps in each Trainer including ``train``(make model recorder) and ``end_train``(modify model recorder).
This is a concept called DelayTrainer
, which can be used in online simulating for parallel training.
In DelayTrainer
, the first step is only to save some necessary info to model recorders, and the second step which will be finished in the end can do some concurrent and time-consuming operations such as model fitting.
Qlib
offer two kinds of Trainer, TrainerR
is the simplest way and TrainerRM
is based on TaskManager to help manager tasks lifecycle automatically.
-
qlib.model.trainer.
begin_task_train
(task_config: dict, experiment_name: str, recorder_name: str = None) → qlib.workflow.recorder.Recorder¶ Begin task training to start a recorder and save the task config.
Parameters: - task_config (dict) – the config of a task
- experiment_name (str) – the name of experiment
- recorder_name (str) – the given name will be the recorder name. None for using rid.
Returns: the model recorder
Return type:
-
qlib.model.trainer.
end_task_train
(rec: qlib.workflow.recorder.Recorder, experiment_name: str) → qlib.workflow.recorder.Recorder¶ Finish task training with real model fitting and saving.
Parameters: - rec (Recorder) – the recorder will be resumed
- experiment_name (str) – the name of experiment
Returns: the model recorder
Return type:
-
qlib.model.trainer.
task_train
(task_config: dict, experiment_name: str) → qlib.workflow.recorder.Recorder¶ Task based training, will be divided into two steps.
Parameters: - task_config (dict) – The config of a task.
- experiment_name (str) – The name of experiment
Returns: Recorder
Return type: The instance of the recorder
-
class
qlib.model.trainer.
Trainer
¶ The trainer can train a list of models. There are Trainer and DelayTrainer, which can be distinguished by when it will finish real training.
-
__init__
()¶ Initialize self. See help(type(self)) for accurate signature.
-
train
(tasks: list, *args, **kwargs) → list¶ Given a list of task definitions, begin training, and return the models.
For Trainer, it finishes real training in this method. For DelayTrainer, it only does some preparation in this method.
Parameters: tasks – a list of tasks Returns: a list of models Return type: list
-
end_train
(models: list, *args, **kwargs) → list¶ Given a list of models, finished something at the end of training if you need. The models may be Recorder, txt file, database, and so on.
For Trainer, it does some finishing touches in this method. For DelayTrainer, it finishes real training in this method.
Parameters: models – a list of models Returns: a list of models Return type: list
-
is_delay
() → bool¶ If Trainer will delay finishing end_train.
Returns: if DelayTrainer Return type: bool
-
-
class
qlib.model.trainer.
TrainerR
(experiment_name: str = None, train_func: Callable = <function task_train>)¶ Trainer based on (R)ecorder. It will train a list of tasks and return a list of model recorders in a linear way.
Assumption: models were defined by task and the results will be saved to Recorder.
-
__init__
(experiment_name: str = None, train_func: Callable = <function task_train>)¶ Init TrainerR.
Parameters: - experiment_name (str, optional) – the default name of experiment.
- train_func (Callable, optional) – default training method. Defaults to task_train.
-
train
(tasks: list, train_func: Callable = None, experiment_name: str = None, **kwargs) → List[qlib.workflow.recorder.Recorder]¶ Given a list of `task`s and return a list of trained Recorder. The order can be guaranteed.
Parameters: - tasks (list) – a list of definitions based on task dict
- train_func (Callable) – the training method which needs at least tasks and experiment_name. None for the default training method.
- experiment_name (str) – the experiment name, None for use default name.
- kwargs – the params for train_func.
Returns: a list of Recorders
Return type: List[Recorder]
-
-
class
qlib.model.trainer.
DelayTrainerR
(experiment_name: str = None, train_func=<function begin_task_train>, end_train_func=<function end_task_train>)¶ A delayed implementation based on TrainerR, which means train method may only do some preparation and end_train method can do the real model fitting.
-
__init__
(experiment_name: str = None, train_func=<function begin_task_train>, end_train_func=<function end_task_train>)¶ Init TrainerRM.
Parameters: - experiment_name (str) – the default name of experiment.
- train_func (Callable, optional) – default train method. Defaults to begin_task_train.
- end_train_func (Callable, optional) – default end_train method. Defaults to end_task_train.
-
end_train
(recs, end_train_func=None, experiment_name: str = None, **kwargs) → List[qlib.workflow.recorder.Recorder]¶ Given a list of Recorder and return a list of trained Recorder. This class will finish real data loading and model fitting.
Parameters: - recs (list) – a list of Recorder, the tasks have been saved to them
- end_train_func (Callable, optional) – the end_train method which needs at least recorder`s and `experiment_name. Defaults to None for using self.end_train_func.
- experiment_name (str) – the experiment name, None for use default name.
- kwargs – the params for end_train_func.
Returns: a list of Recorders
Return type: List[Recorder]
-
-
class
qlib.model.trainer.
TrainerRM
(experiment_name: str = None, task_pool: str = None, train_func=<function task_train>, skip_run_task: bool = False)¶ Trainer based on (R)ecorder and Task(M)anager. It can train a list of tasks and return a list of model recorders in a multiprocessing way.
Assumption: task will be saved to TaskManager and task will be fetched and trained from TaskManager
-
__init__
(experiment_name: str = None, task_pool: str = None, train_func=<function task_train>, skip_run_task: bool = False)¶ Init TrainerR.
Parameters: - experiment_name (str) – the default name of experiment.
- task_pool (str) – task pool name in TaskManager. None for use same name as experiment_name.
- train_func (Callable, optional) – default training method. Defaults to task_train.
- skip_run_task (bool) – If skip_run_task == True: Only run_task in the worker. Otherwise skip run_task.
-
train
(tasks: list, train_func: Callable = None, experiment_name: str = None, before_status: str = 'waiting', after_status: str = 'done', **kwargs) → List[qlib.workflow.recorder.Recorder]¶ Given a list of `task`s and return a list of trained Recorder. The order can be guaranteed.
This method defaults to a single process, but TaskManager offered a great way to parallel training. Users can customize their train_func to realize multiple processes or even multiple machines.
Parameters: - tasks (list) – a list of definitions based on task dict
- train_func (Callable) – the training method which needs at least task`s and `experiment_name. None for the default training method.
- experiment_name (str) – the experiment name, None for use default name.
- before_status (str) – the tasks in before_status will be fetched and trained. Can be STATUS_WAITING, STATUS_PART_DONE.
- after_status (str) – the tasks after trained will become after_status. Can be STATUS_WAITING, STATUS_PART_DONE.
- kwargs – the params for train_func.
Returns: a list of Recorders
Return type: List[Recorder]
-
end_train
(recs: list, **kwargs) → List[qlib.workflow.recorder.Recorder]¶ Set STATUS_END tag to the recorders.
Parameters: recs (list) – a list of trained recorders. Returns: the same list as the param. Return type: List[Recorder]
-
worker
(train_func: Callable = None, experiment_name: str = None)¶ The multiprocessing method for train. It can share a same task_pool with train and can run in other progress or other machines.
Parameters: - train_func (Callable) – the training method which needs at least task`s and `experiment_name. None for the default training method.
- experiment_name (str) – the experiment name, None for use default name.
-
-
class
qlib.model.trainer.
DelayTrainerRM
(experiment_name: str = None, task_pool: str = None, train_func=<function begin_task_train>, end_train_func=<function end_task_train>, skip_run_task: bool = False)¶ A delayed implementation based on TrainerRM, which means train method may only do some preparation and end_train method can do the real model fitting.
-
__init__
(experiment_name: str = None, task_pool: str = None, train_func=<function begin_task_train>, end_train_func=<function end_task_train>, skip_run_task: bool = False)¶ Init DelayTrainerRM.
Parameters: - experiment_name (str) – the default name of experiment.
- task_pool (str) – task pool name in TaskManager. None for use same name as experiment_name.
- train_func (Callable, optional) – default train method. Defaults to begin_task_train.
- end_train_func (Callable, optional) – default end_train method. Defaults to end_task_train.
- skip_run_task (bool) – If skip_run_task == True: Only run_task in the worker. Otherwise skip run_task. E.g. Starting trainer on a CPU VM and then waiting tasks to be finished on GPU VMs.
-
train
(tasks: list, train_func=None, experiment_name: str = None, **kwargs) → List[qlib.workflow.recorder.Recorder]¶ Same as train of TrainerRM, after_status will be STATUS_PART_DONE.
Parameters: - tasks (list) – a list of definition based on task dict
- train_func (Callable) – the train method which need at least task`s and `experiment_name. Defaults to None for using self.train_func.
- experiment_name (str) – the experiment name, None for use default name.
Returns: a list of Recorders
Return type: List[Recorder]
-
end_train
(recs, end_train_func=None, experiment_name: str = None, **kwargs) → List[qlib.workflow.recorder.Recorder]¶ Given a list of Recorder and return a list of trained Recorder. This class will finish real data loading and model fitting.
Parameters: - recs (list) – a list of Recorder, the tasks have been saved to them.
- end_train_func (Callable, optional) – the end_train method which need at least recorder`s and `experiment_name. Defaults to None for using self.end_train_func.
- experiment_name (str) – the experiment name, None for use default name.
- kwargs – the params for end_train_func.
Returns: a list of Recorders
Return type: List[Recorder]
-
worker
(end_train_func=None, experiment_name: str = None)¶ The multiprocessing method for end_train. It can share a same task_pool with end_train and can run in other progress or other machines.
Parameters: - end_train_func (Callable, optional) – the end_train method which need at least recorder`s and `experiment_name. Defaults to None for using self.end_train_func.
- experiment_name (str) – the experiment name, None for use default name.
-
Collector¶
Collector module can collect objects from everywhere and process them such as merging, grouping, averaging and so on.
-
class
qlib.workflow.task.collect.
Collector
(process_list=[])¶ The collector to collect different results
-
__init__
(process_list=[])¶ Init Collector.
Parameters: process_list (list or Callable) – the list of processors or the instance of a processor to process dict.
-
collect
() → dict¶ Collect the results and return a dict like {key: things}
Returns: the dict after collecting. For example:
{“prediction”: pd.Series}
{“IC”: {“Xgboost”: pd.Series, “LSTM”: pd.Series}}
Return type: dict
-
static
process_collect
(collected_dict, process_list=[], *args, **kwargs) → dict¶ Do a series of processing to the dict returned by collect and return a dict like {key: things} For example, you can group and ensemble.
Parameters: - collected_dict (dict) – the dict return by collect
- process_list (list or Callable) – the list of processors or the instance of a processor to process dict.
- processor order is the same as the list order. (The) – For example: [Group1(…, Ensemble1()), Group2(…, Ensemble2())]
Returns: the dict after processing.
Return type: dict
-
-
class
qlib.workflow.task.collect.
MergeCollector
(collector_dict: Dict[str, qlib.workflow.task.collect.Collector], process_list: List[Callable] = [], merge_func=None)¶ A collector to collect the results of other Collectors
For example:
We have 2 collector, which named A and B. A can collect {“prediction”: pd.Series} and B can collect {“IC”: {“Xgboost”: pd.Series, “LSTM”: pd.Series}}. Then after this class’s collect, we can collect {“A_prediction”: pd.Series, “B_IC”: {“Xgboost”: pd.Series, “LSTM”: pd.Series}}-
__init__
(collector_dict: Dict[str, qlib.workflow.task.collect.Collector], process_list: List[Callable] = [], merge_func=None)¶ Init MergeCollector.
Parameters: - collector_dict (Dict[str,Collector]) – the dict like {collector_key, Collector}
- process_list (List[Callable]) – the list of processors or the instance of processor to process dict.
- merge_func (Callable) – a method to generate outermost key. The given params are
collector_key
from collector_dict andkey
from every collector after collecting. None for using tuple to connect them, such as “ABC”+(“a”,”b”) -> (“ABC”, (“a”,”b”)).
-
collect
() → dict¶ Collect all results of collector_dict and change the outermost key to a recombination key.
Returns: the dict after collecting. Return type: dict
-
-
class
qlib.workflow.task.collect.
RecorderCollector
(experiment, process_list=[], rec_key_func=None, rec_filter_func=None, artifacts_path={'pred': 'pred.pkl'}, artifacts_key=None, list_kwargs={})¶ -
__init__
(experiment, process_list=[], rec_key_func=None, rec_filter_func=None, artifacts_path={'pred': 'pred.pkl'}, artifacts_key=None, list_kwargs={})¶ Init RecorderCollector.
Parameters: - experiment (Experiment or str) – an instance of an Experiment or the name of an Experiment
- process_list (list or Callable) – the list of processors or the instance of a processor to process dict.
- rec_key_func (Callable) – a function to get the key of a recorder. If None, use recorder id.
- rec_filter_func (Callable, optional) – filter the recorder by return True or False. Defaults to None.
- artifacts_path (dict, optional) – The artifacts name and its path in Recorder. Defaults to {“pred”: “pred.pkl”, “IC”: “sig_analysis/ic.pkl”}.
- artifacts_key (str or List, optional) – the artifacts key you want to get. If None, get all artifacts.
- list_kwargs (str) – arguments for list_recorders function.
-
collect
(artifacts_key=None, rec_filter_func=None, only_exist=True) → dict¶ Collect different artifacts based on recorder after filtering.
Parameters: - artifacts_key (str or List, optional) – the artifacts key you want to get. If None, use the default.
- rec_filter_func (Callable, optional) – filter the recorder by return True or False. If None, use the default.
- only_exist (bool, optional) – if only collect the artifacts when a recorder really has. If True, the recorder with exception when loading will not be collected. But if False, it will raise the exception.
Returns: the dict after collected like {artifact: {rec_key: object}}
Return type: dict
-
get_exp_name
() → str¶ Get experiment name
Returns: experiment name Return type: str
-
Group¶
Group can group a set of objects based on group_func and change them to a dict. After group, we provide a method to reduce them.
For example:
group: {(A,B,C1): object, (A,B,C2): object} -> {(A,B): {C1: object, C2: object}} reduce: {(A,B): {C1: object, C2: object}} -> {(A,B): object}
-
class
qlib.model.ens.group.
Group
(group_func=None, ens: qlib.model.ens.ensemble.Ensemble = None)¶ Group the objects based on dict
-
__init__
(group_func=None, ens: qlib.model.ens.ensemble.Ensemble = None)¶ Init Group.
Parameters: - group_func (Callable, optional) –
Given a dict and return the group key and one of the group elements.
For example: {(A,B,C1): object, (A,B,C2): object} -> {(A,B): {C1: object, C2: object}}
- to None. (Defaults) –
- ens (Ensemble, optional) – If not None, do ensemble for grouped value after grouping.
- group_func (Callable, optional) –
-
group
(*args, **kwargs) → dict¶ Group a set of objects and change them to a dict.
For example: {(A,B,C1): object, (A,B,C2): object} -> {(A,B): {C1: object, C2: object}}
Returns: grouped dict Return type: dict
-
reduce
(*args, **kwargs) → dict¶ Reduce grouped dict.
For example: {(A,B): {C1: object, C2: object}} -> {(A,B): object}
Returns: reduced dict Return type: dict
-
-
class
qlib.model.ens.group.
RollingGroup
¶ Group the rolling dict
-
group
(rolling_dict: dict) → dict¶ Given an rolling dict likes {(A,B,R): things}, return the grouped dict likes {(A,B): {R:things}}
NOTE: There is an assumption which is the rolling key is at the end of the key tuple, because the rolling results always need to be ensemble firstly.
Parameters: rolling_dict (dict) – an rolling dict. If the key is not a tuple, then do nothing. Returns: grouped dict Return type: dict
-
__init__
()¶ Init Group.
Parameters: - group_func (Callable, optional) –
Given a dict and return the group key and one of the group elements.
For example: {(A,B,C1): object, (A,B,C2): object} -> {(A,B): {C1: object, C2: object}}
- to None. (Defaults) –
- ens (Ensemble, optional) – If not None, do ensemble for grouped value after grouping.
- group_func (Callable, optional) –
-
Ensemble¶
Ensemble module can merge the objects in an Ensemble. For example, if there are many submodels predictions, we may need to merge them into an ensemble prediction.
-
class
qlib.model.ens.ensemble.
Ensemble
¶ Merge the ensemble_dict into an ensemble object.
For example: {Rollinga_b: object, Rollingb_c: object} -> object
When calling this class:
- Args:
- ensemble_dict (dict): the ensemble dict like {name: things} waiting for merging
- Returns:
- object: the ensemble object
-
class
qlib.model.ens.ensemble.
SingleKeyEnsemble
¶ Extract the object if there is only one key and value in the dict. Make the result more readable. {Only key: Only value} -> Only value
If there is more than 1 key or less than 1 key, then do nothing. Even you can run this recursively to make dict more readable.
NOTE: Default runs recursively.
When calling this class:
- Args:
- ensemble_dict (dict): the dict. The key of the dict will be ignored.
- Returns:
- dict: the readable dict.
-
class
qlib.model.ens.ensemble.
RollingEnsemble
¶ Merge a dict of rolling dataframe like prediction or IC into an ensemble.
NOTE: The values of dict must be pd.DataFrame, and have the index “datetime”.
When calling this class:
- Args:
- ensemble_dict (dict): a dict like {“A”: pd.DataFrame, “B”: pd.DataFrame}. The key of the dict will be ignored.
- Returns:
- pd.DataFrame: the complete result of rolling.
-
class
qlib.model.ens.ensemble.
AverageEnsemble
¶ Average and standardize a dict of same shape dataframe like prediction or IC into an ensemble.
NOTE: The values of dict must be pd.DataFrame, and have the index “datetime”. If it is a nested dict, then flat it.
When calling this class:
- Args:
- ensemble_dict (dict): a dict like {“A”: pd.DataFrame, “B”: pd.DataFrame}. The key of the dict will be ignored.
- Returns:
- pd.DataFrame: the complete result of averaging and standardizing.
Utils¶
Some tools for task management.
-
qlib.workflow.task.utils.
get_mongodb
() → pymongo.database.Database¶ Get database in MongoDB, which means you need to declare the address and the name of a database at first.
For example:
Using qlib.init():
- mongo_conf = {
- “task_url”: task_url, # your MongoDB url “task_db_name”: task_db_name, # database name
} qlib.init(…, mongo=mongo_conf)
After qlib.init():
- C[“mongo”] = {
- “task_url” : “mongodb://localhost:27017/”, “task_db_name” : “rolling_db”
}
Returns: the Database instance Return type: Database
-
qlib.workflow.task.utils.
list_recorders
(experiment, rec_filter_func=None)¶ List all recorders which can pass the filter in an experiment.
Parameters: - experiment (str or Experiment) – the name of an Experiment or an instance
- rec_filter_func (Callable, optional) – return True to retain the given recorder. Defaults to None.
Returns: a dict {rid: recorder} after filtering.
Return type: dict
-
class
qlib.workflow.task.utils.
TimeAdjuster
(future=True, end_time=None)¶ Find appropriate date and adjust date.
-
__init__
(future=True, end_time=None)¶ Initialize self. See help(type(self)) for accurate signature.
-
set_end_time
(end_time=None)¶ Set end time. None for use calendar’s end time.
Parameters: end_time –
-
get
(idx: int)¶ Get datetime by index.
Parameters: idx (int) – index of the calendar
-
max
() → pandas._libs.tslibs.timestamps.Timestamp¶ Return the max calendar datetime
-
align_idx
(time_point, tp_type='start') → int¶ Align the index of time_point in the calendar.
Parameters: - time_point –
- tp_type (str) –
Returns: index
Return type: int
-
cal_interval
(time_point_A, time_point_B) → int¶ Calculate the trading day interval (time_point_A - time_point_B)
Parameters: - time_point_A – time_point_A
- time_point_B – time_point_B (is the past of time_point_A)
Returns: the interval between A and B
Return type: int
-
align_time
(time_point, tp_type='start') → pandas._libs.tslibs.timestamps.Timestamp¶ Align time_point to trade date of calendar
Parameters: - time_point – Time point
- tp_type – str time point type (“start”, “end”)
Returns: pd.Timestamp
-
align_seg
(segment: Union[dict, tuple]) → Union[dict, tuple]¶ Align the given date to the trade date
for example:
input: {'train': ('2008-01-01', '2014-12-31'), 'valid': ('2015-01-01', '2016-12-31'), 'test': ('2017-01-01', '2020-08-01')} output: {'train': (Timestamp('2008-01-02 00:00:00'), Timestamp('2014-12-31 00:00:00')), 'valid': (Timestamp('2015-01-05 00:00:00'), Timestamp('2016-12-30 00:00:00')), 'test': (Timestamp('2017-01-03 00:00:00'), Timestamp('2020-07-31 00:00:00'))}
Parameters: segment – Returns: Union[dict, tuple] Return type: the start and end trade date (pd.Timestamp) between the given start and end date.
-
truncate
(segment: tuple, test_start, days: int) → tuple¶ Truncate the segment based on the test_start date
Parameters: - segment (tuple) – time segment
- test_start –
- days (int) – The trading days to be truncated the data in this segment may need ‘days’ data
Returns: tuple
Return type: new segment
-
shift
(seg: tuple, step: int, rtype='sliding') → tuple¶ Shift the datatime of segment
Parameters: - seg – datetime segment
- step (int) – rolling step
- rtype (str) – rolling type (“sliding” or “expanding”)
Returns: tuple
Return type: new segment
Raises: KeyError: – shift will raise error if the index(both start and end) is out of self.cal
-
Online Serving¶
Online Manager¶
OnlineManager can manage a set of Online Strategy and run them dynamically.
With the change of time, the decisive models will be also changed. In this module, we call those contributing models online models. In every routine(such as every day or every minute), the online models may be changed and the prediction of them needs to be updated. So this module provides a series of methods to control this process.
This module also provides a method to simulate Online Strategy in history. Which means you can verify your strategy or find a better one.
There are 4 total situations for using different trainers in different situations:
Situations | Description |
---|---|
Online + Trainer | When you want to do a REAL routine, the Trainer will help you train the models. It will train models task by task and strategy by strategy. |
Online + DelayTrainer | When your models don’t have any temporal dependence, the DelayTrainer will train nothing until all tasks have been prepared. It makes user can train all tasks in the end of routine or first_train. |
Simulation + Trainer | When your models have some temporal dependence on the previous models, then you need to consider using Trainer. This means it will REAL train your models in every routine and prepare signals for every routine. |
Simulation + DelayTrainer | When your models don’t have any temporal dependence, you can use DelayTrainer for the ability to multitasking. It means all tasks in all routines can be REAL trained at the end of simulating. The signals will be prepared well at different time segments (based on whether or not any new model is online). |
-
class
qlib.workflow.online.manager.
OnlineManager
(strategies: Union[qlib.workflow.online.strategy.OnlineStrategy, List[qlib.workflow.online.strategy.OnlineStrategy]], trainer: qlib.model.trainer.Trainer = None, begin_time: Union[str, pandas._libs.tslibs.timestamps.Timestamp] = None, freq='day')¶ OnlineManager can manage online models with Online Strategy. It also provides a history recording of which models are online at what time.
-
__init__
(strategies: Union[qlib.workflow.online.strategy.OnlineStrategy, List[qlib.workflow.online.strategy.OnlineStrategy]], trainer: qlib.model.trainer.Trainer = None, begin_time: Union[str, pandas._libs.tslibs.timestamps.Timestamp] = None, freq='day')¶ Init OnlineManager. One OnlineManager must have at least one OnlineStrategy.
Parameters: - strategies (Union[OnlineStrategy, List[OnlineStrategy]]) – an instance of OnlineStrategy or a list of OnlineStrategy
- begin_time (Union[str,pd.Timestamp], optional) – the OnlineManager will begin at this time. Defaults to None for using the latest date.
- trainer (Trainer) – the trainer to train task. None for using TrainerR.
- freq (str, optional) – data frequency. Defaults to “day”.
-
first_train
(strategies: List[qlib.workflow.online.strategy.OnlineStrategy] = None, model_kwargs: dict = {})¶ Get tasks from every strategy’s first_tasks method and train them. If using DelayTrainer, it can finish training all together after every strategy’s first_tasks.
Parameters: - strategies (List[OnlineStrategy]) – the strategies list (need this param when adding strategies). None for use default strategies.
- model_kwargs (dict) – the params for prepare_online_models
-
routine
(cur_time: Union[str, pandas._libs.tslibs.timestamps.Timestamp] = None, task_kwargs: dict = {}, model_kwargs: dict = {}, signal_kwargs: dict = {})¶ Typical update process for every strategy and record the online history.
The typical update process after a routine, such as day by day or month by month. The process is: Update predictions -> Prepare tasks -> Prepare online models -> Prepare signals.
If using DelayTrainer, it can finish training all together after every strategy’s prepare_tasks.
Parameters: - cur_time (Union[str,pd.Timestamp], optional) – run routine method in this time. Defaults to None.
- task_kwargs (dict) – the params for prepare_tasks
- model_kwargs (dict) – the params for prepare_online_models
- signal_kwargs (dict) – the params for prepare_signals
-
get_collector
(**kwargs) → qlib.workflow.task.collect.MergeCollector¶ Get the instance of Collector to collect results from every strategy. This collector can be a basis as the signals preparation.
Parameters: **kwargs – the params for get_collector. Returns: the collector to merge other collectors. Return type: MergeCollector
-
add_strategy
(strategies: Union[qlib.workflow.online.strategy.OnlineStrategy, List[qlib.workflow.online.strategy.OnlineStrategy]])¶ Add some new strategies to OnlineManager.
Parameters: strategy (Union[OnlineStrategy, List[OnlineStrategy]]) – a list of OnlineStrategy
-
prepare_signals
(prepare_func: Callable = <qlib.model.ens.ensemble.AverageEnsemble object>, over_write=False)¶ After preparing the data of the last routine (a box in box-plot) which means the end of the routine, we can prepare trading signals for the next routine.
NOTE: Given a set prediction, all signals before these prediction end times will be prepared well.
Even if the latest signal already exists, the latest calculation result will be overwritten.
Note
Given a prediction of a certain time, all signals before this time will be prepared well.
Parameters: - prepare_func (Callable, optional) – Get signals from a dict after collecting. Defaults to AverageEnsemble(), the results collected by MergeCollector must be {xxx:pred}.
- over_write (bool, optional) – If True, the new signals will overwrite. If False, the new signals will append to the end of signals. Defaults to False.
Returns: the signals.
Return type: pd.DataFrame
-
get_signals
() → Union[pandas.core.series.Series, pandas.core.frame.DataFrame]¶ Get prepared online signals.
Returns: pd.Series for only one signals every datetime. pd.DataFrame for multiple signals, for example, buy and sell operations use different trading signals. Return type: Union[pd.Series, pd.DataFrame]
-
simulate
(end_time=None, frequency='day', task_kwargs={}, model_kwargs={}, signal_kwargs={}) → Union[pandas.core.series.Series, pandas.core.frame.DataFrame]¶ Starting from the current time, this method will simulate every routine in OnlineManager until the end time.
Considering the parallel training, the models and signals can be prepared after all routine simulating.
The delay training way can be
DelayTrainer
and the delay preparing signals way can bedelay_prepare
.Parameters: - end_time – the time the simulation will end
- frequency – the calendar frequency
- task_kwargs (dict) – the params for prepare_tasks
- model_kwargs (dict) – the params for prepare_online_models
- signal_kwargs (dict) – the params for prepare_signals
Returns: pd.Series for only one signals every datetime. pd.DataFrame for multiple signals, for example, buy and sell operations use different trading signals.
Return type: Union[pd.Series, pd.DataFrame]
-
delay_prepare
(model_kwargs={}, signal_kwargs={})¶ Prepare all models and signals if something is waiting for preparation.
Parameters: - model_kwargs – the params for end_train
- signal_kwargs – the params for prepare_signals
-
Online Strategy¶
OnlineStrategy module is an element of online serving.
-
class
qlib.workflow.online.strategy.
OnlineStrategy
(name_id: str)¶ OnlineStrategy is working with Online Manager, responding to how the tasks are generated, the models are updated and signals are prepared.
-
__init__
(name_id: str)¶ Init OnlineStrategy. This module MUST use Trainer to finishing model training.
Parameters: - name_id (str) – a unique name or id.
- trainer (Trainer, optional) – a instance of Trainer. Defaults to None.
-
prepare_tasks
(cur_time, **kwargs) → List[dict]¶ After the end of a routine, check whether we need to prepare and train some new tasks based on cur_time (None for latest).. Return the new tasks waiting for training.
You can find the last online models by OnlineTool.online_models.
-
prepare_online_models
(trained_models, cur_time=None) → List[object]¶ Select some models from trained models and set them to online models. This is a typical implementation to online all trained models, you can override it to implement the complex method. You can find the last online models by OnlineTool.online_models if you still need them.
NOTE: Reset all online models to trained models. If there are no trained models, then do nothing.
- NOTE:
- Current implementation is very naive. Here is a more complex situation which is more closer to the practical scenarios. 1. Train new models at the day before test_start (at time stamp T) 2. Switch models at the test_start (at time timestamp T + 1 typically)
Parameters: - models (list) – a list of models.
- cur_time (pd.Dataframe) – current time from OnlineManger. None for the latest.
Returns: a list of online models.
Return type: List[object]
-
first_tasks
() → List[dict]¶ Generate a series of tasks firstly and return them.
-
-
class
qlib.workflow.online.strategy.
RollingStrategy
(name_id: str, task_template: Union[dict, List[dict]], rolling_gen: qlib.workflow.task.gen.RollingGen)¶ This example strategy always uses the latest rolling model sas online models.
-
__init__
(name_id: str, task_template: Union[dict, List[dict]], rolling_gen: qlib.workflow.task.gen.RollingGen)¶ Init RollingStrategy.
Assumption: the str of name_id, the experiment name, and the trainer’s experiment name are the same.
Parameters: - name_id (str) – a unique name or id. Will be also the name of the Experiment.
- task_template (Union[dict, List[dict]]) – a list of task_template or a single template, which will be used to generate many tasks using rolling_gen.
- rolling_gen (RollingGen) – an instance of RollingGen
-
get_collector
(process_list=[<qlib.model.ens.group.RollingGroup object>], rec_key_func=None, rec_filter_func=None, artifacts_key=None)¶ Get the instance of Collector to collect results. The returned collector must distinguish results in different models.
Assumption: the models can be distinguished based on the model name and rolling test segments. If you do not want this assumption, please implement your method or use another rec_key_func.
Parameters: - rec_key_func (Callable) – a function to get the key of a recorder. If None, use recorder id.
- rec_filter_func (Callable, optional) – filter the recorder by return True or False. Defaults to None.
- artifacts_key (List[str], optional) – the artifacts key you want to get. If None, get all artifacts.
-
first_tasks
() → List[dict]¶ Use rolling_gen to generate different tasks based on task_template.
Returns: a list of tasks Return type: List[dict]
-
prepare_tasks
(cur_time) → List[dict]¶ Prepare new tasks based on cur_time (None for the latest).
You can find the last online models by OnlineToolR.online_models.
Returns: a list of new tasks. Return type: List[dict]
-
Online Tool¶
OnlineTool is a module to set and unset a series of online models. The online models are some decisive models in some time points, which can be changed with the change of time. This allows us to use efficient submodels as the market-style changing.
-
class
qlib.workflow.online.utils.
OnlineTool
¶ OnlineTool will manage online models in an experiment that includes the model recorders.
-
__init__
()¶ Init OnlineTool.
-
set_online_tag
(tag, recorder: Union[list, object])¶ Set tag to the model to sign whether online.
Parameters: - tag (str) – the tags in ONLINE_TAG, OFFLINE_TAG
- recorder (Union[list,object]) – the model’s recorder
-
get_online_tag
(recorder: object) → str¶ Given a model recorder and return its online tag.
Parameters: recorder (Object) – the model’s recorder Returns: the online tag Return type: str
-
reset_online_tag
(recorder: Union[list, object])¶ Offline all models and set the recorders to ‘online’.
Parameters: recorder (Union[list,object]) – the recorder you want to reset to ‘online’.
-
online_models
() → list¶ Get current online models
Returns: a list of online models. Return type: list
-
update_online_pred
(to_date=None)¶ Update the predictions of online models to to_date.
Parameters: to_date (pd.Timestamp) – the pred before this date will be updated. None for updating to the latest.
-
-
class
qlib.workflow.online.utils.
OnlineToolR
(default_exp_name: str = None)¶ The implementation of OnlineTool based on (R)ecorder.
-
__init__
(default_exp_name: str = None)¶ Init OnlineToolR.
Parameters: default_exp_name (str) – the default experiment name.
-
set_online_tag
(tag, recorder: Union[qlib.workflow.recorder.Recorder, List[T]])¶ Set tag to the model’s recorder to sign whether online.
Parameters: - tag (str) – the tags in ONLINE_TAG, NEXT_ONLINE_TAG, OFFLINE_TAG
- recorder (Union[Recorder, List]) – a list of Recorder or an instance of Recorder
-
get_online_tag
(recorder: qlib.workflow.recorder.Recorder) → str¶ Given a model recorder and return its online tag.
Parameters: recorder (Recorder) – an instance of recorder Returns: the online tag Return type: str
-
reset_online_tag
(recorder: Union[qlib.workflow.recorder.Recorder, List[T]], exp_name: str = None)¶ Offline all models and set the recorders to ‘online’.
Parameters: - recorder (Union[Recorder, List]) – the recorder you want to reset to ‘online’.
- exp_name (str) – the experiment name. If None, then use default_exp_name.
-
online_models
(exp_name: str = None) → list¶ Get current online models
Parameters: exp_name (str) – the experiment name. If None, then use default_exp_name. Returns: a list of online models. Return type: list
-
update_online_pred
(to_date=None, exp_name: str = None)¶ Update the predictions of online models to to_date.
Parameters: - to_date (pd.Timestamp) – the pred before this date will be updated. None for updating to latest time in Calendar.
- exp_name (str) – the experiment name. If None, then use default_exp_name.
-
RecordUpdater¶
Updater is a module to update artifacts such as predictions when the stock data is updating.
-
class
qlib.workflow.online.update.
RMDLoader
(rec: qlib.workflow.recorder.Recorder)¶ Recorder Model Dataset Loader
-
__init__
(rec: qlib.workflow.recorder.Recorder)¶ Initialize self. See help(type(self)) for accurate signature.
-
get_dataset
(start_time, end_time, segments=None) → qlib.data.dataset.DatasetH¶ Load, config and setup dataset.
This dataset is for inference.
Parameters: - start_time – the start_time of underlying data
- end_time – the end_time of underlying data
- segments – dict the segments config for dataset Due to the time series dataset (TSDatasetH), the test segments maybe different from start_time and end_time
Returns: the instance of DatasetH
Return type:
-
-
class
qlib.workflow.online.update.
RecordUpdater
(record: qlib.workflow.recorder.Recorder, *args, **kwargs)¶ Update a specific recorders
-
__init__
(record: qlib.workflow.recorder.Recorder, *args, **kwargs)¶ Initialize self. See help(type(self)) for accurate signature.
-
update
(*args, **kwargs)¶ Update info for specific recorder
-
-
class
qlib.workflow.online.update.
DSBasedUpdater
(record: qlib.workflow.recorder.Recorder, to_date=None, hist_ref: int = 0, freq='day', fname='pred.pkl')¶ Dataset-Based Updater - Provding updating feature for Updating data based on Qlib Dataset
Assumption - Based on Qlib dataset - The data to be updated is a multi-level index pd.DataFrame. For example label , prediction.
LABEL0datetime instrument 2021-05-10 SH600000 0.006965
SH600004 0.003407… … 2021-05-28 SZ300498 0.015748
SZ300676 -0.001321-
__init__
(record: qlib.workflow.recorder.Recorder, to_date=None, hist_ref: int = 0, freq='day', fname='pred.pkl')¶ Init PredUpdater.
Parameters: - record – Recorder
- to_date – update to prediction to the to_date
- hist_ref –
int Sometimes, the dataset will have historical depends. Leave the problem to users to set the length of historical dependency
Note
the start_time is not included in the hist_ref
-
prepare_data
() → qlib.data.dataset.DatasetH¶ Load dataset
Separating this function will make it easier to reuse the dataset
Returns: the instance of DatasetH Return type: DatasetH
-
update
(dataset: qlib.data.dataset.DatasetH = None)¶ Update the data in a recorder.
Parameters: DatasetH – the instance of DatasetH. None for reprepare.
-
get_update_data
(dataset: qlib.data.dataset.Dataset) → pandas.core.frame.DataFrame¶ return the updated data based on the given dataset
The difference between get_update_data and update - update_date only include some data specific feature - update include some general routine steps(e.g. prepare dataset, checking)
-
-
class
qlib.workflow.online.update.
PredUpdater
(record: qlib.workflow.recorder.Recorder, to_date=None, hist_ref: int = 0, freq='day', fname='pred.pkl')¶ Update the prediction in the Recorder
-
get_update_data
(dataset: qlib.data.dataset.Dataset) → pandas.core.frame.DataFrame¶ return the updated data based on the given dataset
The difference between get_update_data and update - update_date only include some data specific feature - update include some general routine steps(e.g. prepare dataset, checking)
-
-
class
qlib.workflow.online.update.
LabelUpdater
(record: qlib.workflow.recorder.Recorder, to_date=None, **kwargs)¶ Update the label in the recorder
Assumption - The label is generated from record_temp.SignalRecord.
-
__init__
(record: qlib.workflow.recorder.Recorder, to_date=None, **kwargs)¶ Init PredUpdater.
Parameters: - record – Recorder
- to_date – update to prediction to the to_date
- hist_ref –
int Sometimes, the dataset will have historical depends. Leave the problem to users to set the length of historical dependency
Note
the start_time is not included in the hist_ref
-
get_update_data
(dataset: qlib.data.dataset.Dataset) → pandas.core.frame.DataFrame¶ return the updated data based on the given dataset
The difference between get_update_data and update - update_date only include some data specific feature - update include some general routine steps(e.g. prepare dataset, checking)
-
Utils¶
Serializable¶
Serializable will change the behaviors of pickle. - It only saves the state whose name does not start with _ It provides a syntactic sugar for distinguish the attributes which user doesn’t want. - For examples, a learnable Datahandler just wants to save the parameters without data when dumping to disk
-
qlib.utils.serial.Serializable.
dump_all
¶ will the object dump all object
-
qlib.utils.serial.Serializable.
exclude
¶ What attribute will not be dumped