Challenge Data

Prediction of daily stock end-of-day movements on the US market by CFM

Description

Dates

Started on Jan. 6, 2023

Challenge context

The American stock market is the most liquid equity market of the planet and hence provides many opportunities for investments. The last 2 hours of the trading sessions, between 2PM and 4PM, are the more liquid ones.

These liquid periods are the preferred time for buying or selling of huge quantities of assets because of smaller transaction costs and usually decreased volatility. Thus estimating in advance the price during this period allows to adapt the scale of the transactions and further optimize the costs of the portfolios.

Challenge goals

The goal is to estimate the main direction that will occur during the last two hours of trading session, given the preceding history of the day.

To avoid to suffer of usual market noise, we only consider 3 states:

• clear decreasing of price;
• small evolution in both side;
• clear increasing of price.

Data description

We provide input price evolutions (returns), with a granularity of 5 minutes, which leads to 53 values (4.5 hours) per day per equity.

As price movements are really small on such time windows, we give basis points (bps), so $\frac{P_{t + 5 minutes} - P_{t}}{P_{t}} * 10^4$ .

We give then rows:

• 'ID', the unique input identifier;
• 'day', the day identifier (not unique inside dataset(s));
• 'equity', the equity identifier (not unique inside dataset(s));
• 'r0', $\frac{P_{09:35} - P_{09:30}}{P_{09:30}} * 10^4$ , the returns of the first 5 minutes window;
• 'r1', $\frac{P_{09:40} - P_{09:35}}{P_{09:35}} * 10^4$ , the returns of the next 5 minutes window;
• 'r52', $\frac{P_{14:00} - P_{13:55}}{P_{13:55}} * 10^4$ , the last returns.

To reduce the prediction task difficulty, we limit the prediction to the classification of the final returns, in 3 categories, limited by $±25$ bps, so output is:

• $-1$ if $\frac{P_{16PM} - P_{14PM}}{P_{14PM}} * 10^4$ is below $-25$ bps;
• $0$ if this ratio is between $-25$ and $25$ bps;
• $+1$ if greater than $25$ bps.

The output to predict are given such:

• 'ID', the unique input identifier, correspond to the input ones;
• 'reod', the class of the returns during the the end of the day period, in $[-1, 0, 1]$ .

The training set and test set don't share the same days neither same equities. Nevertheless, all equities are of the same markets and share the same distribution. However, days of two datasets are from totally different periods, in order to reflect a real task of prediction, with fresh data coming from real world with potentially new characteristics.

Benchmark description

As this is a classification with only 3 potentials results, random (or even fixed!) responses might lead to a score around $33\%$ . This is an easy way to test your solution.

The benchmark is less naive, and aggregates some basic characteristics of these 53 values, and try to detect pattern given their 2 main characteristics (day and equity). Then a basic state-of-art classifier leads to a (test) score of $41.74\%$ .

Files

Files are accessible when logged in and registered to the challenge

The challenge provider

Founded in 1991, Capital Fund Management (CFM) is a successful alternative asset manager and a pioneer in the field of quantitative trading applied to capital markets across the globe. Our methodology relies on statistically robust analysis of terabytes of data to inform and enable our asset allocation, trading decisions and automated order execution. Our people’s diversity and dedication contribute to CFM’s unique culture of research, innovation and achievement. We are a Great Place to Work company and we offer a collaborative and informal work environment, attractive offices and facilities.