Challenges of the 2022 season



Can you predict the tide ?

Participants will have to forecast the sea surges in two western European coastal cities.

We place ourselves in a forecast setup: knowing the surge values and the sea-level pressure field in the last 5 days, we want to predict the surge values in the next five days. It is hence a time series prediction problem. The signals we consider are:

  • the surge, which is a function of the time.
  • the sea-level pressure, which is a function of the time, the latitude and the longitude.

The score (y^,y)\ell(\hat{y}, y) we use to measure the quality of the prediction y^\hat{y} compared to the true values yy is a weighted version of the mean square error (MSE). The weights depend linearly on the forecast time, with a bigger weight for the first forecast time and a lower weight for the last forecast time. The prediction for the two cities are computed independently, and the final loss is their sum:

def surge_prediction_metric(y_true, y_pred):
    w = np.linspace(1, 0.1, 10)[np.newaxis]
    surge1_cols = [
        'surge1_t0', 'surge1_t1', 'surge1_t2', 'surge1_t3', 'surge1_t4',
        'surge1_t5', 'surge1_t6', 'surge1_t7', 'surge1_t8', 'surge1_t9' ]
    surge2_cols = [
        'surge2_t0', 'surge2_t1', 'surge2_t2', 'surge2_t3', 'surge2_t4',
        'surge2_t5', 'surge2_t6', 'surge2_t7', 'surge2_t8', 'surge2_t9' ]
    surge1_score = (w * (y_true[surge1_cols].values - y_pred[surge1_cols].values)**2).mean()
    surge2_score = (w * (y_true[surge2_cols].values - y_pred[surge2_cols].values)**2).mean()

    return surge1_score + surge2_score

Since the surge values are normalised (zero mean and standard deviation 1), 11 - \ell can be seen as a percentage of explained variance. With a trivial zero prediction of all values, the score is 1\ell \approx 1 , meaning that we explain 0 % of the variance. A score bigger than one is hence worse that the zero prediction and can be considered as "bad".

Prediction of missing Bid-Ask spread values

The goal of this challenge is to recover missing values in financial time series covering 300 futures contracts.

The financial time series in question contains “daily bid-ask spreads”, which are simply some daily average of the “bid-ask spreads” observed throughout the day. This bid-ask spread is the difference between the highest price a buyer is ready to pay for the futures contract (highest bid price) and the lowest price a seller will accept (lowest ask price) for a given asset.

A large price difference (large spread) between bid and ask reflects the fact that only a few participants are ready to sell or to buy (as they are not making efforts to give a price closer to what the other side would like). Conversely, a small bid-ask spread reflects a liquid market where participants are more willing to trade. Anticipating the average bid-ask spread of the next trading day is thus important for knowing how much one can expect to trade during the day.

In this challenge, we removed the daily spread for some days of a 10 years history, for each of about 300 futures contracts.

The goal of the challenge consists in predicting these missing values using other features of the time series.

Participants can ask questions, find answers and share their findings on the official CFM Data Challenge forum.

Data Centric Movie Reviews Sentiment Classification

This is a data-centric challenge. You will have to submit the best data, not the best model.

Data centric challenge

The interest of the challenge lies in the training pipeline being kept fixed. Instead of improving a machine learning model / training pipeline, you will have to select, augment, clean, generate, source, etc… a training dataset starting from a given dataset. Actually you will be allowed to give anything to the model.

Movie reviews

The underlying machine learning task is to predict the sentiment (positive or negative) of movie reviews. You won't be able to choose the model to use, or have to create complex ensembles of models that add no real value. To allow you to iterate fast on your experiments, we provide you with the training script, which uses a rather simple model, fastText.

What happens when you make a submission ?

Submissions are datasets of up to 20k movie reviews. When you submit a dataset, it is sent to our servers. This dataset is used to train a model using the same pipeline as provided. Then, the model trained with your dataset is tested on a different test set of movie reviews. The performance is measured by the accuracy on this test set. The test set is kept hidden, because else you could just provide the test set as a submission, and let the model overfit on it. We will reveal a small fraction of this test set (a few dozen texts) to give a sense of the test data distribution.

Regional Climate Forecast 2022

How accurately can we predict regional temperature anomalies based on past and neighbouring climate observations ?

Semantic segmentation of industrial facility point cloud

The goal of this challenge is to perform a semantic segmentation of a 3D point cloud.

The point cloud of one EDF industrial facility digital mock-ups is composed of 45 billions of 3D points. The reconstruction work consisting of the fitting of 90 000 geometric primitives on the point cloud. To perform this task, the operators have to manually segment part of the point cloud corresponding to an equipment to then fit the suitable geometric primitive. This manual segmentation is the more tedious of the global production scheme. Therefore, EDF R&D studies solutions to perform it automatically.

as-built digital mock-up creation with CAD reconstruction from point cloud

Because EDF industrial facilities are sensitive and hardly accessible or available for experiments, our team works with the EDF Lab Saclay boiling room. The digital mock-up of this test environment has been produced with the same methodology than the other industrial facilities.

For the ENS challenge, EDF provides a dataset with a cloud of 2.1 billion points acquired in an industrial environment, the boiling room of EDF Lab Saclay whose design is sufficiently close to an industrial building for this segmentation task. Each point of the cloud has been manually given a ground truth label.

The project purpose is a semantic segmentation task of a 3D point cloud. It consists in training a machine learning model ff to automatically segment the point cloud x=(xi)1iNx=(x_i)_{1 \leq i \leq N} in different classes y=(yi)1iNy=(y_i)_{1 \leq i \leq N} where NN is the point cloud size. The model infers a label class yi=f(xi)y_i = f(x_i) for each point xix_i .

To assess the results, we compute the weighted F1-score over all CC classes (sklearn.metrics.f1_score). It is defined by:

F1:=i=0C1wiPi×RiPi+Ri, F_1 := \sum_{i=0}^{C-1} w_i \frac{P_i \times R_i}{P_i + R_i}, ```

where PiP_i , RiR_i are respectively the point-wise precision and recall of the class ii , and wiw_i is the inverse of the number of true instances for class ii .

Learning factors for stock market returns prediction

The goal of this challenge is to design/learn factors for stock return prediction using the exotic parameter space introduced in the context section.

Participants will be able to use three-year data history of 5050 stock from the same stock market (training data set) to provide the model parameters (A,β)(A,\beta) as outputs. Then the predictive model associated with these parameters will be tested to predict the returns of 5050 other stocks over the same three-year time period (testing data set).

We allow D=250D=250 days for the time depth and F=10F=10 for the number of factors.

Metric. More precisely, we assess the quality of the predictive model with parameters (A,β)(A,\beta) as follows. Let R~tR50\tilde R_t\in\R^{50} be the returns of the 5050 stocks of the testing data set over the three-year period (t=0753t=0\ldots753 ) and let S~t=S~t(A,β)\tilde S_{t} = \tilde S_{t}(A,\beta) be the participants' predictor for R~t\tilde R_{t} . The metric to maximize is defined by

Metric(A,β):=1504t=250753S~t,R~tS~tR~t \mathrm{Metric}(A,\beta):= \frac 1{504}\sum_{t=250}^{753} \frac{\langle \tilde S_{t}, \tilde R_{t}\rangle}{\|\tilde S_{t}\|\|\tilde R_{t}\|}

if Ai,Ajδij106|\langle A_i,A_j\rangle-\delta_{ij}|\leq 10^{-6} for all i,ji,j and Metric(A,β):=1\mathrm{Metric}(A,\beta):=-1 otherwise.

By construction the metric takes its values in [1,1][-1,1] and equals to 1-1 as soon as there exists a couple (i,j)(i,j) breaking too much the orthonormality condition.

Output structure. The output expected from the participants is a vector where the model parameters A=[A1,,A10]R250×10A=[A_1,\ldots,A_{10}]\in\mathbb R^{250\times 10} and βR10\beta\in\R^{10} are stacked as follows

Output=[A1A10β]R2510 \text{Output} = \left[\begin{matrix} A_1 \\ \vdots \\ A_{10} \\ \beta \end{matrix}\right]\in\mathbb R^{2510}

Return Forecasting of Cryptocurrency Clusters

The goal of the challenge is to predict the returns vs. bitcoin of clusters of cryptoassets.

At Napoleon, we are interested in detecting which assets are likely to move together in the same direction, i.e. assets whose returns (absolute price changes) are statistically positively correlated. Such assets are regrouped into "clusters", which can be seen as the crypto equivalent of equity sectors or industries. The knowledge of such clusters can then be used to optimize portfolios, build advanced trading strategies (long/short absolute, market neutral), evaluate the systematic risk, etc. In order to build new trading strategies, it can be helpful to know whether a given sector/cluster will outperform the market represented by the bitcoin. For this reason, given a cluster C={A1,...,An}\mathcal{C} = \{A_1, ..., A_n\} composed of nn assets AiA_i , this challenge aims at predicting the return relatively to bitcoin of an equally weighted portfolio composed of {A1,...,An}\{ A_1, ..., A_n\} in the next hour, given series of returns for the last 23 hours for assets in the cluster, as well as some metadata.

Bankers and markets

Your goal is to understand how financial markets react when central bankers deliver official speeches.

We do not provide the speeches themselves - otherwise the participants would quickly find out the date and the market moves ! - but we provide a transformed version. They were processed by a predefined BERT-style transformer, and this gives the input of the problem. The output is the mean price evolution of a collection of 39 different time series; these time series correspond to 13 different markets mesured at 3 different time scales.

We have computed the difference between closing prices of these 13 markets at 3 different maturities and the price of these markets at the closing time of the date of the speech. We are not interessed in very short time effects (between the beginning of the speech and the closing of the same day) and leaking effects (trading occuring because of information leakage before the beginning of the speech). A few tests have given us an indication that if a speech has an effect on the markets, it seems to intervene before the end of 2 weeks following the date of the speech: we have chosen 1 day lag, 1 week lag and 2 week lag to measure the possible effects on the markets.

As expected, at first sight it was difficult to distinguish an effect. We have therefore developped a technique to boost the response of the transformer using numerical NLP techniques. We deliver here the result of these boost. It is not miraculous, and the small number of points in the dataset is a real handicap.

The 13 markets are the following :

  1. VIX: Index for the volatility of US stocks.
  2. V2X: Index for the volatility of european stocks.
  3. EURUSD: Change Rate EURO - US Dollars.
  4. EUROUSDV1M: Volatility of at the money 1 month options on the EURUSD.
  5. SPX: Index of the US Stocks.
  6. SX5E: Index of Euro Stocks.
  7. SRVIX: Swap Rate Volatility Index , it is an interest rate volatility index.
  8. CVIX: Crypto Volatility Index, it is a crypto-currency volatility index.
  9. MOVE: developed by Merrill Lynch, measures fear within the bond market.
  10. USGG2YR: US Bonds 2 years.
  11. USGG10YR: US Bonds 10 years.
  12. GDBR2YR: German Bonds 2 years.
  13. GDBR10YR: German Bonds 2 years.
Predicting odor compound concentrations

Can you predict the concentration of Sulfur dioxide (SO2) at one location from a network of sensors?

Using measurement data from ATMO Normandie sensor network, weather data, and land use data from Copernicus Corine Land Cover (CLC), the goal is to do Multivariate Time Series Forecasting and predict the SO2 hourly concentration in μg / m³ corresponding to the next 12 hours at the Le Havre, MAS station from the last 48 hours.

Real estate price prediction

Real estate prices are usually predicted from numerical data: surface, location, etc. Can you do better by using photos?

Estimating housing real estate price is quite a common topic, with an important litterature on estimating prices based on usual data such as: location, surface, land, number of bedrooms, age of the building... The approaches are usually sufficient to estimate the price range but lack precision. However, few have worked to see if adding photos of the asset would bring complementary information, enabling a more precise price estimation. The objective is thus to work on modelling French housing real estate prices based on usual hierarchical, tabular data and a few photos (between 1 and 6) for each asset and see if it allows better performance than a model trained without the photos.

What do you see in the stock market data?

The goal of the challenge is to train machine learning models to look for anomalies within stock market data.

It is relatively easy to design one algorithm seeking one specific kind of event, and then to implement as many algorithms as there are types of atypical events. However, it would be also beneficial to be able to detect any type of atypical events thanks to one model, which could learn to recognize common features between these different events.

From a sample of market data: time series price and volume data, the challenger is invited to predict the existence of an atypical event on an hourly basis by financial instrument. Any kind of approach can be experimented but the AMF is particularly interested in computer vision techniques on reconstructed time series plots if the challenger thinks that it is relevant.

Learning biological properties of molecules from their structure

The goal is to discover the biological properties of new chemical compounds using already existing experimental data.

The current costs of bringing a new medical drug to the market are huge, reaching 2.0 billion US dollars and 10-15 years of continuous research. A desire to eliminate many of these unnecessary costs has accelerated the emergence and acceptance of the science of Cheminformatics. Based on the concept that "similar chemicals have similar properties", one would take existing experimental data Y and build statistical correlative models to create a map between structures of chemical compounds and the observed Y values. Thus, the property Y of novel chemical compounds would not have to be measured. Instead, one would simply draw a structure of a completely new molecule on the computer screen and submit it to the correlative model to predict it.

Computers cannot perceive chemical structures (atoms plus interatomic connectivity) the way human chemists do. A translation of chemical structures into terms understandable by computers is thus necessary. Sophisticated algorithms exist that take molecular connectivity tables and, sometimes, 3D atomic coordinates to generate molecular descriptors – numeric variables that describe molecular structures. Our software is able to calculate the same set of N molecular descriptors per compound, where N is on the order of several hundred. Collecting the properly aligned vectors of descriptors for M chemical compounds, each with known observed Y value, forms an MxN training matrix X. Since raw values of different molecular descriptors are calculated on different scales, normalization to a common scale is required prior to modeling (e.g., the -1 to 1 scale). Not all the descriptors provide meaningful input into a successful Y = f(X) model. Therefore, choosing the “right” descriptors for modeling (a.k.a. feature selection) is the first critical step in model building. Once the suitable subset of N columns is chosen, the corresponding reduced training matrix along with M-dimensional vector Y is submitted to a model training algorithm (e.g., machine learning). Model performance is evaluated on an independent, external test set of T compounds encoded with the same set of N descriptors.