You can find the list of the winners at the end of 2020 of these challenges here. The challenges continue to run afterwards, and the leaderboards are still updated, but the list of the preceding year winners is fixed.
Given recent trades and order books from a set of trading venues, predict on which trading venue the next trade will be executed.
Community forum for sharing ideas and making faster progress:
http://datachallenge.cfm.fr/
Additional information can also be found on this forum and after registering on the Challenge Data website.
The goal of this problem is to estimate the production of a group of industrial assets, based on daily measurements and capacity constraints.
Considering a number of industrial installations Ai with nominal production capacities Ci , J daily measurements (J=5 ) (xi,w,lj ) are carried on each asset i , for the week w , for the weekday l (l=1...7 ), for the measure type j , in order to detect patterns in operations that affect production.
The assets Ai are gathered into K disjoint groups (K=2 ). For each group, installations report the actual production levels yk,w at an aggregated level, on a weekly basis.
For each group k , the goal is to predict y^k,w , as the sum of the productions of all assets in that group (y^k,w=∑iC^i,w ) under the constraint that an asset production is smaller than its maximum capacity for each week, i.e.
0<=C^i,w<=Cimax∀w
C^i0,w0 is to be estimated as a function of the measures pertaining to asset i0 and week w0 :
C^i0,w0=f(xi0,w0,lj)∀j∈1,...J,∀l∈1,...,7
Notes:
The metrics can only be used to assess the production of the asset where these were taken
An asset usually cannot produce more than its nominal capacity, but sometimes spikes in production can go up 120% the nominal capacity.
The target is reported weekly while the measures are daily. Hence, the measures corresponding to the target for week w are the measures of all the days in week w .
Other factors might impact productivity, like economic conditions (i.e. demand-linked curtailment), but as a first approach, we assume such effects are not significant.
Some series might be lacking for some assets and are filled entirely with None values
These measurements may correspond to incidents or maintenance, which impact the productivity of the assets, possibly with varying significance (e.g. large v.s small incidents, …).
The goal of this challenge is to design an algorithm reading the consumption index from a valid picture of a meter.
In this dataset, we try to predict the gender of someone based on 40 windows of 2 seconds taken during sleep.
The proposed challenge aims at predicting the return of a stock in the US market using historical data over a recent period of 20 days. The one-day return of a stock j on day t with price Pjt (adjusted from dividends and stock splits) is given by:
Rjt=Pjt−1Pjt−1
In this challenge, we consider the residual stock return, which corresponds to the return of a stock without the market impact. Historical data are composed of residual stock returns and relative volumes, sampled each day during the 20 last business days (approximately one month). The relative volume Vjt at time t of a stock j among the n stocks is defined by:
VˉjtVjt=median({Vjt−1,…,Vjt−20})Vt=Vˉjt−n1i=1∑nVˉit
where Vjt is the volume at time t of a stock j . We also give additional information about each stock such as its industry and sector.
The metric considered is the accuracy of the predicted residual stock return sign.
The goal of the challenge is to predict defect on starter motor production lines. During production samples assembly, different values (torques, angles ...) are measured on different mounting stations. At the end of the line, additional measures are performed on two test benches in order to isolate defects. As a result, samples are tagged ‘OK’, ‘KO’. We would like to design a model that could identify such defects before the test bench step.