The American stock market is the most liquid equity market of the planet and hence provides many opportunities for investments. The last 2 hours of the trading sessions, between 2PM and 4PM, are the more liquid ones.
These liquid periods are the preferred time for buying or selling of huge quantities of assets because of smaller transaction costs and usually decreased volatility. Thus estimating in advance the price during this period allows to adapt the scale of the transactions and further optimize the costs of the portfolios.
The goal is to estimate the main direction that will occur during the last two hours of trading session, given the preceding history of the day.
To avoid to suffer of usual market noise, we only consider 3 states:
clear decreasing of price;
small evolution in both side;
clear increasing of price.
We provide input price evolutions (returns), with a granularity of 5
minutes, which leads to 53 values (4.5 hours) per day per equity.
As price movements are really small on such time windows, we give
basis points (bps), so PtPt+5minutes−Pt∗104
We give then rows:
'ID', the unique input identifier;
'day', the day identifier (not unique inside dataset(s));
'equity', the equity identifier (not unique inside dataset(s));
, the returns of the first 5 minutes window;
, the returns of the next 5 minutes window;
, the last returns.
To reduce the prediction task difficulty, we limit the prediction to
the classification of the final returns, in 3 categories, limited by
bps, so output is:
is below −25
if this ratio is between −25
if greater than 25
The output to predict are given such:
'ID', the unique input identifier, correspond to the input ones;
'reod', the class of the returns during the the end of the day
period, in [−1,0,1]
The training set and test set don't share the same days neither same equities.
Nevertheless, all equities are of the same markets and share the same distribution.
However, days of two datasets are from totally different periods, in order to reflect a real task of
prediction, with fresh data coming from real world with potentially new characteristics.
As this is a classification with only 3 potentials results, random (or even fixed!) responses might
lead to a score around 33%
. This is an easy way to test your solution.
The benchmark is less naive, and aggregates some basic
characteristics of these 53 values, and try to detect pattern given
their 2 main characteristics (day and equity).
Then a basic state-of-art classifier leads to a (test) score of
Files are accessible when logged in and registered to the challenge
The challenge provider
Founded in 1991, Capital Fund Management (CFM) is a successful alternative asset manager and a pioneer in the field of quantitative trading applied to capital markets across the globe. Our methodology relies on statistically robust analysis of terabytes of data to inform and enable our asset allocation, trading decisions and automated order execution. Our people’s diversity and dedication contribute to CFM’s unique culture of research, innovation and achievement. We are a Great Place to Work company and we offer a collaborative and informal work environment, attractive offices and facilities.