Welcome to the Challenge Data website of ENS and Collège de France

We organize challenges of data sciences from data provided by public services, companies and laboratories: general documentation and FAQ. The prize ceremony is in February at the College de France.

For participants

Guide to create an account, choose your challenges and submit solutions.

For professors

Guide to create a course project from selected challenges and to follow student progresses.

For Challenge providers

Guide to submit a challenge for the next season.

About us

This website is managed by  the Data team of the École Normale Supérieure of Paris in partnership with the Collège de France. It is supported by the CFM chair and the PRAIRIE Institute.

Challenges 2021

Stock trading: prediction of auction volumes

The goal of this year's challenge is to predict the volume (total value of stock exchanged) available for auction, for 900 stocks over about 350 days.

Detecting Sleep Apnea from raw physiological signals

The goal of this challenge is to build a model to automatically detect sleep apnea events from PSG data.

You will have access to samples from 44 nights recorded with a polysomnography and scored for apnea events by a consensus of human experts. For each of the 44 nights, 200 windows (without intersection) are sampled with the associated labels (which are binary segmentation masks). Each of these windows contains 90 seconds of signal from 8 physiological signals sampled at 100Hz:

  • Abdominal belt: Abdominal contraction
  • Airflow: Respiratory Airflow from the subject
  • PPG (Photoplethysmogram): Cardiac activity
  • Thoracic belt: Record Thoracic contraction
  • Snoring indicator
  • SPO2: O2 saturation of the blood
  • C4-A1: EEG derivation
  • O2-A1: EEG derivation

The segmentation mask is sampled at 1Hz and contains 90 labels (0 = No event, 1 = Apnea event). Both examples can be reproduced using visualization.py provided in the supplementary files.

The 8 PSG signals with the associated segment mask. The apnea events is visible with Abdominal belt, thoracic belt and airflow amplitude dropping sensibly below baseline. The SPO2 drops after the event.

The 8 PSG signals with the associated segment mask. Two short apnea events are visible with the associated breathing disruption. The SPO2 drops during the second event is likely to be a consequence of the first event.

We want to assess if the events detected by the algorithm are in agreement with the one detected by the sleep experts.


As we seek to evaluate events-wise agreement between the model and the scorers, the metric cannot be computed directly on the segmentation mask. First, events are extracted from the binary mask with the following rule:

An apnea event is a sequence of consecutive 1 in the binary mask.

For each apnea events from a window, we extract the start and end index to produce a list of events. This list can be empty if not events are found. The same processing is applied to the ground-truth masks to extract the ground-truth events.

In order to assess the agreements between the ground-truth and estimated events, the F1-score is computed. Two events match if their IoU (intersection over union or Jaccard Index) is above 0.3.

Hence a detected event is a True Positive if it matches with a ground-truth event, it's a False Positive otherwise. On the other hand, a ground-truth event without a matching detected event is a False Negative. TP, FP, FN are summed over all the windows to compute the F1-score.

The detailed implementation can be found in the metrics file.

Rakuten Multi-modal Colour Extraction

The goal of this data challenge is to predict the "colour" of a product, given its image, title, and description. A product can be of multiple colours, making it a multi-label classification problem.

For example, in Rakuten Ichiba catalog, a product with a Japanese title タイトリスト プレーヤーズ ローラートラベルカバー (Titleist Players Roller Travel Cover) associated with an image and sometimes with an additional description. The colour of this product is annotated as Red and Black. There are other products with different titles, images, with possible descriptions, and associated colour attribute tags. Given these information on the products, like the example above, this challenge proposes to model a multi-label classifier to classify the products into its corresponding colour attributes.


The metric used in this challenge to rank the participants is the weighted-F1 score.

Scikit-Learn package has an F1 score implementation (link) and can be used for this challenge with its average parameter set to "weighted".

Land cover predictive modeling from satellite images
This challenge tackles the problem of land cover modeling. The goal of this challenge is to predict the proportion of classes of land cover for an input satellite image, in which every pixel is assigned to a land cover label.
Assessing uncertainty in air quality predictions

This data challenge aims at introducing a new statistical model to predict and analyze air quality in big buildings using observations stored in the Oze-Energies database. Physics based approaches to build air quality simulation tool in order to simulate complex building behaviors are widespread in the most complex situations. The main drawbacks of such softwares to simulate the behavior of transient systems are:

  • the significant computational time required to run such models as they integrate many noisy sources and a huge number of parameters and require essentially massive thermodynamics computations;
  • the fact that they often solely output a single-point estimate at each time, without providing any uncertainty measures to assess their confidence about their predictions.

In order to analyze and predict future air quality to alert and correct building management systems to ensure comfort and satisfactory sanitary conditions, this challenge aims at solving issue ii), i.e. at designing models which takes into account the uncertainty in the exogenous data describing external weather conditions and the occupation of the building. This will allow to provide confidence intervals on the air quality predictions, here on the humidity of the air inside the building.

Interpreting neural networks predictions for multi-label classification of music catalogs.

More precisely, our thesaurus comprises few hundreds tags (e.g. blues-rock, electric-guitar, happy), regrouped in classes (Genres, Instruments or Moods), partitioned into categories (genres-orchestral, instruments-brass, mood-dramatic, etc.). Each audio track of our database may be tagged with one or more labels of each class so the auto-tagging process is a multi-label classification problem; we can train neural networks to learn from audio features and generate numerical predictions to minimise the binary cross entropy with respect to the one-hot encoded labelling of the dataset.

On the other hand, to display the tagging on our front-end, we require a discrete, tag-wise, labelling, so a further interpretation is nedded, to convert the predictions into decisions, and we can use more suitable metrics to evaluate the quality of the tagging. We want the participants of the challenge to optimise this decision problem, leveraging all the possible information available from the groundtruth and the global predictions to design a selection algorithm producing the most consistent labelling. In other words, build a multi-label classifier, receiving, as input, the predictions generated by our neural networks for all tags and their categories.

Our suggested benchmark is a column-wise thresholding (see details below) so this strategy uses neither the categorical predictions, nor the possible correlations between tags. For example, a more row-oriented approach (for each track, select a tag for its prediction value with respect to the predictions for the other tags) or a hierarchical strategy (decide on categories first, then chose tags among the selected categories) may improve the final classifications.

Reconstruction of Liquid Asset Performance

If we find an illiquid asset to be untradeable, then the signal of this asset should not result in a trading position. To counteract this difficulty, an alternative would be to project the signals from illiquid assets to liquid ones.

To do so, the proposed challenge aims at determining the link at a given time tt between the returns of illiquid and liquid assets. The one-day return of a stock jj at a time tt with price PjtP_j^t (adjusted for dividends and stock splits) is defined as:

Rjt=PjtPjt11 R_j^t = \frac{P_j^t}{P_j^{t-1}} - 1

Let Yt=(R1t,,RLt)\mathbf Y^t = (R_1^t, \dots, R_L^t) be the returns of LL liquid assets and Xt=(R1t,,RNt)\mathbf X^t = (R_1^t, \dots, R_N^t) be the returns of NN illiquid assets at a given time tt . The objective of this challenge is to determine a mapping function η:RNRL\eta: \mathbb R^N \rightarrow \mathbb R^L , that would link the returns of NN illiquid assets to the returns of LL liquid assets such that Yt=η(Xt)\mathbf Y^t = \eta(\mathbf X^t) .

Since predictive signals can be seen as estimated returns, the signals generated by QRT on NN illiquid assets, defined by X^t\hat {\mathbf X}^t , can be mapped to projected signals Y^t\hat{\mathbf Y}^t on LL liquid instruments such that Y^t=η(X^t)\hat{\mathbf Y}^t = \eta(\hat{\mathbf X}^t) . However, since η\eta is purely theoretical, the mapping must rely on approximations. Therefore, the idea would be to estimate a model η^\hat \eta that would predict the returns of L=100L=100 liquid instruments, using the returns of N=100N=100 illiquid instruments, given historical data.

The model η^\hat \eta can then be seen as a multi-output prediction of LL returns, or as the combination of LL models η^j\hat \eta_j , for j=1,,Nj = 1, \dots, N that would individually predict the return of each liquid instrument jj .

For simplicity and practical reasons, we chose to transform this challenge into a classification problem. In practice, we are more interested into being right on the trend instead of the value. Thus, instead of predicting the returns of the liquid assets, the estimated model η^\hat \eta shall be predicting the signs of the liquid assets.

The metric used to judge the quality of the predictions is a custom weighted accuracy defined by:

f(y,y^)=1y1i=1nyi×1y^i=sign(yi) f(\mathbf y, \hat{\mathbf y}) = \frac{1}{\| \mathbf y\|_1} \sum_{i=1}^n |y_i| \times 1_{\hat y_i = sign(y_i)}

where 1y^i=sign(yi)1_{\hat y_i = sign(y_i)} is equal to 1 if the ii -th prediction y^i{1,1}\hat y_i \in \{-1, 1\} is of the same sign as the ii -th true value yiy_i . This metric gives more importance to the good classification of high value returns. Indeed, it can be more important to be right on a 7% move than on a 0.5% move.

EV Charging Stations Usage

The objective of this challenge is to design a model capable of predicting the usage of some EV charging stations in Paris, more specifically the times when they are available, actively charging a car, plugged, offline or down.

Who are the high-frequency traders ?

The goal of the challenge is to classify traders within three categories, HFT, non HFT and MIX.

According to the AMF in-house expert-based classification, based on the knowledge that AMF has on the market players, market players are divided into three categories, HFT, MIX and non-HFT.

From a set of behavioural variables based on order and transaction data, the challenger is invited to predict the category to which a given participant belongs.

The proposed classification algorithm will then be applied to other data sources for which market players are currently not well known by the AMF.


The goal of this challenge is to confirm the presence of defects on parts based on pictures taken during production of Power Module in Valeo plant in Sablé sur Sarthe.

During module assembly, an “automatic optical inspection” (AOI) is done after a wire bonding process to check the conformity and quality of the parts. This inspection is based on pictures taken by camera and basic algorithms used to measure some specific parameters on the parts. The AOI machine is efficient to measure dimensions on the parts (width of bonding wire for example) but much less for “aspect” defects. This difficulty to properly analyze this type of defect leads to a large number of parts that must be confirmed manually by operators. In certain conditions, the rate of “false defect” (parts considered KO by machine but OK by operator) could reach 10 or 20% of the production.

The target of this challenge is to define a model that could provide a better result than AOI to discriminate between good and bad parts for aspect defects. For this analysis, we would like to focus only on bonding with thin wire (200um).

Sinusoid segmentation in subsurface images
In the input 2D wellbore data, the formation boundaries are represented by sinusoids capturing the azimuth* and amplitude* of the dip. Detecting and segmenting the sinusoid manually is a tedious, time-consuming task that can take experts up to several hours for one well. Therefore, we are aiming at leveraging the power of machine learning and deep learning to automatically detect those dips (sinusoids) and segment them not only to save time but also to increase the segmentation performances. In this project, we are suggesting the development of a deep learning approach to segment the dips given an input electromagnetic image map. The size of the model is important, so it will be an aspect to consider in this data challenge. * amplitude: is the magnitude of the sinusoid * azimuth: is the horizontal position of a given point counted from the left side of the image block. It is generally expressed as an angle (knowing that the entire width of the image represents 360° over the wellbore)