The American stock market is the most liquid equity market of the planet and hence provides many opportunities for investments. The last 2 hours of the trading sessions, between 2PM and 4PM, are the more liquid ones.
These liquid periods are the preferred time for buying or selling of huge quantities of assets because of smaller transaction costs and usually decreased volatility. Thus estimating in advance the price during this period allows to adapt the scale of the transactions and further optimize the costs of the portfolios.
Challenge goals
The goal is to estimate the main direction that will occur during the last two hours of trading session, given the preceding history of the day.
To avoid to suffer of usual market noise, we only consider 3 states:
clear decreasing of price;
small evolution in both side;
clear increasing of price.
Data description
We provide input price evolutions (returns), with a granularity of 5
minutes, which leads to 53 values (4.5 hours) per day per equity.
As price movements are really small on such time windows, we give
basis points (bps), so PtβPt+5minutesββPtβββ104.
We give then rows:
'ID', the unique input identifier;
'day', the day identifier (not unique inside dataset(s));
'equity', the equity identifier (not unique inside dataset(s));
'r0', P09:30βP09:35ββP09:30βββ104, the returns of the first 5 minutes window;
'r1', P09:35βP09:40ββP09:35βββ104, the returns of the next 5 minutes window;
β¦
'r52', P13:55βP14:00ββP13:55βββ104, the last returns.
To reduce the prediction task difficulty, we limit the prediction to
the classification of the final returns, in 3 categories, limited by
Β±25bps, so output is:
β1 if P14PMβP16PMββP14PMβββ104 is below β25bps;
0 if this ratio is between β25 and 25bps;
+1 if greater than 25bps.
The output to predict are given such:
'ID', the unique input identifier, correspond to the input ones;
'reod', the class of the returns during the the end of the day
period, in [β1,0,1].
The training set and test set don't share the same days neither same equities.
Nevertheless, all equities are of the same markets and share the same distribution.
However, days of two datasets are from totally different periods, in order to reflect a real task of
prediction, with fresh data coming from real world with potentially new characteristics.
Benchmark description
As this is a classification with only 3 potentials results, random (or even fixed!) responses might
lead to a score around 33%. This is an easy way to test your solution.
The benchmark is less naive, and aggregates some basic
characteristics of these 53 values, and try to detect pattern given
their 2 main characteristics (day and equity).
Then a basic state-of-art classifier leads to a (test) score of
41.74%.
Files
Files are accessible when logged in and registered to the challenge
The challenge provider
Founded in 1991, Capital Fund Management (CFM) is a successful alternative asset manager and a pioneer in the field of quantitative trading applied to capital markets across the globe. Our methodology relies on statistically robust analysis of terabytes of data to inform and enable our asset allocation, trading decisions and automated order execution. Our peopleβs diversity and dedication contribute to CFMβs unique culture of research, innovation and achievement. We are a Great Place to Work company and we offer a collaborative and informal work environment, attractive offices and facilities.