Challenge Data

Reconstruction of Liquid Asset Performance
by QRT


Login to your account


Description


NO LOGO FOR THIS CHALLENGE
Competitive challenge
Economic sciences
Finance
Classification
Time series
10MB to 1GB
Basic level

Dates

Started on Jan. 4, 2021


Challenge context

In quantitative finance, a predictive signal refers to an indicator which triggers the action to either buy or sell an asset. At QRT, we are generating predictive signals on various marketplaces leading to the trading (buy or sell) of a large number of assets. A part of these signals might be generated on assets which can be considered as illiquid.

In finance, the liquidity of an asset refers to the ease with which it can be bought or sold. For example, Apple stock (one of the biggest companies in the world) is considered as being a liquid asset whereas Alten stock (a French multinational consulting firm) can be considered as being illiquid and thus, more difficult to trade in high quantities. As illiquid assets are difficult to trade directly, we are interested in quantifying the relationships between illiquid and liquid assets in order to transfer trading orders from illiquid assets to liquid ones.

Feel free to visit and register to our dedicated forum at https://challengedata.qube-rt.com/ for more information about the challenge, the data and QRT.


Challenge goals

If we find an illiquid asset to be untradeable, then the signal of this asset should not result in a trading position. To counteract this difficulty, an alternative would be to project the signals from illiquid assets to liquid ones.

To do so, the proposed challenge aims at determining the link at a given time tt between the returns of illiquid and liquid assets. The one-day return of a stock jj at a time tt with price PjtP_j^t (adjusted for dividends and stock splits) is defined as:

Rjt=PjtPjt−1−1 R_j^t = \frac{P_j^t}{P_j^{t-1}} - 1

Let Yt=(R1t,…,RLt)\mathbf Y^t = (R_1^t, \dots, R_L^t) be the returns of LL liquid assets and Xt=(R1t,…,RNt)\mathbf X^t = (R_1^t, \dots, R_N^t) be the returns of NN illiquid assets at a given time tt. The objective of this challenge is to determine a mapping function η:RN→RL\eta: \mathbb R^N \rightarrow \mathbb R^L, that would link the returns of NN illiquid assets to the returns of LL liquid assets such that Yt=η(Xt)\mathbf Y^t = \eta(\mathbf X^t).

Since predictive signals can be seen as estimated returns, the signals generated by QRT on NN illiquid assets, defined by X^t\hat {\mathbf X}^t, can be mapped to projected signals Y^t\hat{\mathbf Y}^t on LL liquid instruments such that Y^t=η(X^t)\hat{\mathbf Y}^t = \eta(\hat{\mathbf X}^t). However, since η\eta is purely theoretical, the mapping must rely on approximations. Therefore, the idea would be to estimate a model η^\hat \eta that would predict the returns of L=100L=100 liquid instruments, using the returns of N=100N=100 illiquid instruments, given historical data.

The model η^\hat \eta can then be seen as a multi-output prediction of LL returns, or as the combination of LL models η^j\hat \eta_j, for j=1,…,Nj = 1, \dots, N that would individually predict the return of each liquid instrument jj.

For simplicity and practical reasons, we chose to transform this challenge into a classification problem. In practice, we are more interested into being right on the trend instead of the value. Thus, instead of predicting the returns of the liquid assets, the estimated model η^\hat \eta shall be predicting the signs of the liquid assets.

The metric used to judge the quality of the predictions is a custom weighted accuracy defined by:

f(y,y^)=1∥y∥1∑i=1n∣yi∣×1y^i=sign(yi) f(\mathbf y, \hat{\mathbf y}) = \frac{1}{\| \mathbf y\|_1} \sum_{i=1}^n |y_i| \times 1_{\hat y_i = sign(y_i)}

where 1y^i=sign(yi)1_{\hat y_i = sign(y_i)} is equal to 1 if the ii-th prediction y^i∈{−1,1}\hat y_i \in \{-1, 1\} is of the same sign as the ii-th true value yiy_i. This metric gives more importance to the good classification of high value returns. Indeed, it can be more important to be right on a 7% move than on a 0.5% move.


Data description

3 datasets are provided as csv files, split between training inputs and outputs, and test inputs.

Input datasets comprise 103 columns: the first ID column contains unique row identifiers while the other 102 descriptive features correspond to:

  • • ID_DAY: an index of the day (the dates are randomized and anonymized so there is no continuity or link between any dates),
  • • RET_i: the return of illiquid asset ii; there are 100 such illiquid assets,
  • • ID_TARGET: the ID of the liquid asset to predict; there are 100 such liquid assets whose return we want to predict

Output datasets are only composed of 2 columns:

  • • ID: the unique row identifier (corresponding to the input identifiers)
    and the target:
  • • RET_TARGET: the return of the liquid asset associated to ID_TARGET

The solution files submitted by participants shall follow this output dataset format and contain only two columns: ID and RET_TARGET, where the ID values correspond to the input test data and RET_TARGET values correspond to the predictions of the sign of the liquid assets' return. An example submission file containing random predictions is provided.

The supplementary dataset is composed of 5 columns:

  • • ID_asset: the ID of a liquid or illiquid asset
  • • CLASS_LEVEL_j with 1≤j≤41 \leq j \leq 4: a sector/industry group identifier to which belongs the corresponding asset. The higher the level jj, the more specific the industry domain is.

267100 samples corresponding to 2748 unique days are available for the training datasets while 114468 samples corresponding to 1177 unique days are used for the test datasets.

The train/test split has been performed randomly along the day variable. As a consequence, no day is shared between the training and test datasets.


Benchmark description

The proposed benchmark is very simple and combines 100 linear regressions to predict the returns. In short, on the training dataset, for each liquid asset jj, we determine the illiquid asset kk with the maximum absolute correlation and we estimate the regression parameter β^j,k\hat \beta_{j, k} between them. Therefore, the prediction of the sign of liquid asset jj's return is defined by sign(R^jt)=sign(β^j,k×Rkt)sign(\hat R_j^t) = sign(\hat \beta_{j, k} \times R_k^t).

A more detailed notebook is available on our forum in order to help you getting started in the competition.


Files


Files are accessible when logged in and registered to the challenge


The challenge provider


PROVIDER LOGO

Qube Research & Technologies Group is a quantitative and systematic investment manager employing around 300 people with offices in Hong Kong, London, Mumbai, Paris and Singapore. We are a technology driven firm implementing a scientific approach to financial investment. QRT’s market presence is global and expands across the largest liquid electronic venues. The combination of data, research, technology and trading expertise has shaped our DNA and is at the heart of our innovation and development dynamic. The firm acts as an investment manager managing open-ended Funds used for management of third party capital.