Challenge Data

Challenges 2019


You can find the presentations (in French) of the challenges in the website of the Collège de France


Detecting breast cancer metastases
The goal of this challenge is to develop new algorithms to detect metastases in images of patients diagnosed with breast cancer.
Drug-related questions classification
The goal of Posos challenge is to predict for each question the associated intent.
Screening and Diagnosis of esophageal cancer from in-vivo microscopy images
The goal of this challenge is to build an image classifier to assist physicians in the screening and diagnosis of esophageal cancer. Such a tool would have a massive impact on patient management and patient lives.
Predict brain deep sleep slow oscillation
In this dataset, we try to predict whether or not a slow oscillation will be followed by another one in sham condition, i.e. without any stimulation.
Dynamic Profile Forecasting
We would like to forecast 7 dynamic profile time-series, modelling the consumption shape of several mass-market customer groups (residential and small businesses with subscribed power up to 36 kVA) thanks to meteorological and calendar data, as well as any other real time dataset potentially correlated with consumption patterns. Those profiles are coefficients (without units) for each half-hour in the dataset. The dataset size depends on each specific profile (collected from Oct 13th, 2013 onwards for residential profiles and from Nov 1st, 2016 for commercial profiles). This challenge is about forecasting dynamic profiles values from their past values and all the components of Enedis’ Half hourly Electrical Balancing. The Testing period will be in the past, from July 1st, 2017 to June 30th, 2018. There are many possible explanatory variables since consumption patterns are linked to consumers’ behavior and economic activity. Weather conditions (cold spell / heat wave) and business holidays will impact energy consumption but some other factors may also contribute to modifying energy consumption.
Spatiotemporal PM10 concentration prediction

In order to provide air quality forecasts, Plume Labs has built a unique database with readings collected by monitoring stations all over the world. The problem we submit consists in predicting the PM10 readings of some air quality monitoring stations using the readings provided by the monitoring stations nearby as well as urban features. For each PM10 reading in the training dataset, we will provide the following information:

  1. Land-use characterization of the monitoring station location (i.e. is it located in a residential area, industrial area, …)
  2. Readings at the closest monitoring stations The accuracy obtained by such a prediction model is a very good indicator of how an air quality prediction model performs in locations where there is no monitoring station
Exotic pricing with multidimensional non-linear interpolation
The purpose of the challenge is to use a training set of 1 million prices to learn how to price a specific type of instruments described by 23 parameters by nonlinear interpolation on these prices. The benefit would be to singularly accelerate computation time while retaining good pricing precision. The exotic option to price is one contained in a callable debt instrument whose final redemption amount, coupon payments and callability are conditional on the performance of a basket of three stocks or equity indices relatively to certain barriers. All parameters have been normalized to be between 0 and 1. The given price has also been normalized between 0 and 1. Because the 0 price is in the center of the set of pricings, most of the prices are around 0.5.
Historical consumption regression for electricity supply pricing
The goal of the challenge is to predict, based on the analysis of the correlation of a year of consumption and weather training data, the electricity consumption of two given sites for a test year. In operational conditions, the new consumption profiles would be integrated to electricity supply pricing analysis.
Prediction of Sharpe ratio for blends of quantitative strategies

Goal description:

NC’s goal is to find the best allocation among its quantitative strategies every week (more or less 5 trading days), i.e.the combination for which the Sharpe ratio will be the highest over the next 5 trading days. In order to adapt this issue to a ML problem, we have decided to create a challenge consisting in predicting the Sharpe ratio S* of a given combination (w_1, ..., w_7) of strategies, where the Sharpe ratio is slightly modified to avoid near 0 volatility issues. Given the log returns log_return_formulas for each strategy i and time s, the Sharpe ratio of the combination (w_1, ..., w_7) is defined for all time t, as:

sharpe_formulas

Crack the neural code of the brain

The challenge goal is to classify the brain activity state of an animal based on spiking activity patterns of its individual neurons. For this purpose, participants are given recordings of neural spike sequences from the hippocampi of rats. Each spiking sequence in the dataset has a corresponding activity state label (two brain states, labeled STATE1, or STATE2). This is, therefore, a binary classification problem, where each data sample is a time series and participants have to predict which class a given time series sample belongs to.

Join our Slack at bit.do/neuralcode to connect with us and challenge participants and discuss the challenge.

Optimizing well-being at work
This challenge proposes to develop machine learning based approaches so as to predict individuals' comfort model using several time series of environmental data obtained from sensors in a large building. The objective is to learn a classifier that uses these time series as inputs to predict the associated comfort class computed as an average of the comfort classes of all individuals in the building, assumed to experience the same environmental conditions.
Building Claim Prediction

The goal of the challenge is to predict if a building will have an insurance claim during a certain period. You will have to predict a probability of having at least one claim over the insured period of a building. The model will be based on the building characteristics. The target variable is a:

  • - 1 if the building has at least a claim over the insured period.
  • - 0 if the building doesn’t have a claim over the insured period.

During this challenge, you are encouraged to use external data. For instance: shops number by INSEE code (geographical code), unemployment rate by INSEE code, weather…

Some data can be found on the following website: data.gouv.fr

Solve 2x2x2 Rubik's cube
The goal is to design an automatic Rubik's analyzer that estimates the current length of the shortest path to the solution. Algebraic manipulations of this type could be used in different contexts and solve complex problems. Considering a new unseen configuration on the 2x2x2 Rubik's Cube, the goal of the challenge is to predict the length of the shortest path to the solution.
Prediction of daily stock movements on the US market
The goal of this challenge is to predict the sign of the returns (= price change over some time interval) at the end of about 700 days for about 700 stocks.