Challenge data

Description

Competitive challenge

Economic sciences

Finance

Classification

Time series

10MB to 1GB

Intermediary level

Dates

Started on Jan. 6, 2020

Challenge context

Quantitative investment strategies require the analysis of historical data to predict the trend of a stock in a near future. However, the extremely low level of signal / noise makes it a very challenging problem. Digging slight information among the enormous amount of available data in the market is a key goal for Qube RT. To do so, Machine Learning techniques can be used to make better trading decisions through deep analysis of thousands of different data sources. In a financial world in constant movement, it is extremely difficult to detect patterns that make a stock move up or down. This challenge is an illustration of the financial stock prediction.

Feel free to visit and register to our dedicated forum at https://challengedata.qube-rt.com/ for more information about the challenge, the data and QRT.

You can watch a video discussing the challenge here.

Challenge goals

The proposed challenge aims at predicting the return of a stock in the US market using historical data over a recent period of 20 days. The one-day return of a stock $j$ on day $t$ with price $P_j^t$ (adjusted from dividends and stock splits) is given by:

$R_j^t = \frac{P_j^t}{P_j^{t-1}} - 1$

In this challenge, we consider the residual stock return, which corresponds to the return of a stock without the market impact. Historical data are composed of residual stock returns and relative volumes, sampled each day during the 20 last business days (approximately one month). The relative volume $\mathcal V_j^t$ at time $t$ of a stock $j$ among the $n$ stocks is defined by:

$\begin{aligned} \bar{V}_j^t &= \frac{V^t}{\mathrm{median(\{ V_j^{t-1}, \dots, V_j^{t-20}\})}} \\ \mathcal V_j^t &= \bar{V}_j^t - \frac{1}{n} \sum_{i=1}^n \bar{V}_i^t \end{aligned}$

where $V_j^t$ is the volume at time $t$ of a stock $j$ . We also give additional information about each stock such as its industry and sector.

The metric considered is the accuracy of the predicted residual stock return sign.

Data description

3 datasets are provided as csv files, split between training inputs and outputs, and test inputs.

Input datasets comprise 47 columns: the first ID column contains unique row identifiers while the other 46 descriptive features correspond to:

• DATE: an index of the date (the dates are randomized and anonymized so there is no continuity or link between any dates),
• STOCK: an index of the stock,
• INDUSTRY: an index of the stock industry domain (e.g., aeronautic, IT, oil company),
• INDUSTRY_GROUP: an index of the group industry,
• SUB_INDUSTRY: a lower level index of the industry,
• SECTOR: an index of the work sector,
• RET_1 to RET_20: the historical residual returns among the last 20 days (i.e., RET_1 is the return of the previous day and so on),
• VOLUME_1 to VOLUME_20: the historical relative volume traded among the last 20 days (i.e., VOLUME_1 is the relative volume of the previous day and so on),

Output datasets are only composed of 2 columns:

• ID: the unique row identifier (corresponding to the input identifiers)
and the binary target:
• RET: the sign of the residual stock return at time $t$

The solution files submitted by participants shall follow this output dataset format (i.e contain only two columns, ID and RET, where the ID values correspond to the input test data). An example submission file containing random predictions is provided.

418595 observations (i.e. lines) are available for the training datasets while 198429 observations are used for the test datasets.

Benchmark description

We propose a simple baseline using Random Forests fitted with 500 trees and a maximum depth of 8. Only the 5 previous stock returns and relative volumes are used, along with the STOCK and an additional feature representing the mean of RET_1 conditionally to RANK and SECTOR . The missing values are filled with 0.

The public score of this benchmark is 51.31%. A notebook explaining the generation of the benchmark is available in the supplementary files.

Files

Files are accessible when logged in and registered to the challenge

The challenge provider

Qube Research & Technologies (QRT) is a global quantitative and systematic investment manager, operating in all asset classes across the world. Driven by a shared passion for data, research, technology and trading expertise, we strive to deliver high-quality returns for our investors. Established in 2016, QRT can rely on its 1,400+ employees across 11 offices in Europe, Middle East and Asia Pacific. QRT supports various coding initiatives as well as academic projects developing and promoting maths and science education.

PROVIDER WEBSITE

Congratulation for the winners of the challenge

1 Romain Poncet
2 -
3 Agnès François, Jessy Idez

You can find the whole list of winners of the season here

Challenge Data

Stock Return Prediction
by QRT

Description

Dates

Challenge context

Challenge goals

Data description

Benchmark description

Files

The challenge provider

Congratulation for the winners of the challenge

Challenge Data

Stock Return Prediction by QRT

Description

Dates

Challenge context

Challenge goals

Data description

Benchmark description

Files

The challenge provider

Congratulation for the winners of the challenge

Stock Return Prediction
by QRT