# Challenge Data

### Reconstruction of Liquid Asset Performance by QRT

#### Description

Competitive challenge
Economic sciences
Finance
Classification
Time series
10MB to 1GB
Basic level

##### Dates

Started on Jan. 4, 2021

##### Challenge context

In quantitative finance, a predictive signal refers to an indicator which triggers the action to either buy or sell an asset. At QRT, we are generating predictive signals on various marketplaces leading to the trading (buy or sell) of a large number of assets. A part of these signals might be generated on assets which can be considered as illiquid.

In finance, the liquidity of an asset refers to the ease with which it can be bought or sold. For example, Apple stock (one of the biggest companies in the world) is considered as being a liquid asset whereas Alten stock (a French multinational consulting firm) can be considered as being illiquid and thus, more difficult to trade in high quantities. As illiquid assets are difficult to trade directly, we are interested in quantifying the relationships between illiquid and liquid assets in order to transfer trading orders from illiquid assets to liquid ones.

Feel free to visit and register to our dedicated forum at https://challengedata.qube-rt.com/ for more information about the challenge, the data and QRT.

##### Challenge goals

If we find an illiquid asset to be untradeable, then the signal of this asset should not result in a trading position. To counteract this difficulty, an alternative would be to project the signals from illiquid assets to liquid ones.

To do so, the proposed challenge aims at determining the link at a given time $t$ between the returns of illiquid and liquid assets. The one-day return of a stock $j$ at a time $t$ with price $P_j^t$ (adjusted for dividends and stock splits) is defined as:

$R_j^t = \frac{P_j^t}{P_j^{t-1}} - 1$

Let $\mathbf Y^t = (R_1^t, \dots, R_L^t)$ be the returns of $L$ liquid assets and $\mathbf X^t = (R_1^t, \dots, R_N^t)$ be the returns of $N$ illiquid assets at a given time $t$. The objective of this challenge is to determine a mapping function $\eta: \mathbb R^N \rightarrow \mathbb R^L$, that would link the returns of $N$ illiquid assets to the returns of $L$ liquid assets such that $\mathbf Y^t = \eta(\mathbf X^t)$.

Since predictive signals can be seen as estimated returns, the signals generated by QRT on $N$ illiquid assets, defined by $\hat {\mathbf X}^t$, can be mapped to projected signals $\hat{\mathbf Y}^t$ on $L$ liquid instruments such that $\hat{\mathbf Y}^t = \eta(\hat{\mathbf X}^t)$. However, since $\eta$ is purely theoretical, the mapping must rely on approximations. Therefore, the idea would be to estimate a model $\hat \eta$ that would predict the returns of $L=100$ liquid instruments, using the returns of $N=100$ illiquid instruments, given historical data.

The model $\hat \eta$ can then be seen as a multi-output prediction of $L$ returns, or as the combination of $L$ models $\hat \eta_j$, for $j = 1, \dots, N$ that would individually predict the return of each liquid instrument $j$.

For simplicity and practical reasons, we chose to transform this challenge into a classification problem. In practice, we are more interested into being right on the trend instead of the value. Thus, instead of predicting the returns of the liquid assets, the estimated model $\hat \eta$ shall be predicting the signs of the liquid assets.

The metric used to judge the quality of the predictions is a custom weighted accuracy defined by:

$f(\mathbf y, \hat{\mathbf y}) = \frac{1}{\| \mathbf y\|_1} \sum_{i=1}^n |y_i| \times 1_{\hat y_i = sign(y_i)}$

where $1_{\hat y_i = sign(y_i)}$ is equal to 1 if the $i$-th prediction $\hat y_i \in \{-1, 1\}$ is of the same sign as the $i$-th true value $y_i$. This metric gives more importance to the good classification of high value returns. Indeed, it can be more important to be right on a 7% move than on a 0.5% move.

##### Data description

3 datasets are provided as csv files, split between training inputs and outputs, and test inputs.

Input datasets comprise 103 columns: the first ID column contains unique row identifiers while the other 102 descriptive features correspond to:

• â€¢ ID_DAY: an index of the day (the dates are randomized and anonymized so there is no continuity or link between any dates),
• â€¢ RET_i: the return of illiquid asset $i$; there are 100 such illiquid assets,
• â€¢ ID_TARGET: the ID of the liquid asset to predict; there are 100 such liquid assets whose return we want to predict

Output datasets are only composed of 2 columns:

• â€¢ ID: the unique row identifier (corresponding to the input identifiers)
and the target:
• â€¢ RET_TARGET: the return of the liquid asset associated to ID_TARGET

The solution files submitted by participants shall follow this output dataset format and contain only two columns: ID and RET_TARGET, where the ID values correspond to the input test data and RET_TARGET values correspond to the predictions of the sign of the liquid assets' return. An example submission file containing random predictions is provided.

The supplementary dataset is composed of 5 columns:

• â€¢ ID_asset: the ID of a liquid or illiquid asset
• â€¢ CLASS_LEVEL_j with $1 \leq j \leq 4$: a sector/industry group identifier to which belongs the corresponding asset. The higher the level $j$, the more specific the industry domain is.

267100 samples corresponding to 2748 unique days are available for the training datasets while 114468 samples corresponding to 1177 unique days are used for the test datasets.

The train/test split has been performed randomly along the day variable. As a consequence, no day is shared between the training and test datasets.

##### Benchmark description

The proposed benchmark is very simple and combines 100 linear regressions to predict the returns. In short, on the training dataset, for each liquid asset $j$, we determine the illiquid asset $k$ with the maximum absolute correlation and we estimate the regression parameter $\hat \beta_{j, k}$ between them. Therefore, the prediction of the sign of liquid asset $j$'s return is defined by $sign(\hat R_j^t) = sign(\hat \beta_{j, k} \times R_k^t)$.

A more detailed notebook is available on our forum in order to help you getting started in the competition.

#### Files

Files are accessible when logged in and registered to the challenge

#### The challenge provider

Qube Research & Technologies Group is a quantitative and systematic investment manager employing around 300 people with offices in Hong Kong, London, Mumbai, Paris and Singapore. We are a technology driven firm implementing a scientific approach to financial investment. QRTâ€™s market presence is global and expands across the largest liquid electronic venues. The combination of data, research, technology and trading expertise has shaped our DNA and is at the heart of our innovation and development dynamic. The firm acts as an investment manager managing open-ended Funds used for management of third party capital.