Challenge Data

Regional climate forecast

Because of an error in the true prediction file (y_test file) this challenge will be removed at the end of June and the winners won't be rewarded. It will be proposed again next year with appropriate data.

Register or login to participate !



Started on Jan. 4, 2021

Challenge context

The prediction of temperature anomalies on interannual timescales (1 or 5 years) is one of the most challenging topic of climate science. Recently a study has shown in that the earth temperature anomaly when globally average has a certain predictability on 1 to 5 years in the future. The aim of this study is to demonstrate such skillful predictability but at the regional scale.

However in climate science, because of the inherent chaotic nature of the studied system, prediction has to be made in a probabilistic framework. Indeed the likelihood of extreme events is potentially as important as the knowledge of the expected event, for risk assessment and early mitigation. Hence a skillful prediction has to be accurate (minimal error in prediction) and reliable (good sampling of prediction spread).

Another challenge of climate science is the lack or the limited amount of data, to train any prediction system. Thus, this challenge aims to find an algorithm that predicts the temperature anomalies of the next years using the past 10 years of data and provide a probabilistic prediction, or at least the expected prediction together with its uncertainty (within a Gaussian assumption).

Ref: Sévellec, F., & Drijfhout, S. S. (2018). A novel probabilistic forecast system predicting anomalously warm 2018-2022 reinforcing the long-term global warming trend. Nature Communications, 9(1), [3024]. DOI: 10.1038/s41467-018-05442-8

Challenge goals

To estimate the validity of the predictions we propose to use two different measures: the coefficient of determination (R2), which shows the skill of the mean prediction; and the reliability, which measures the accuracy of the spread in the prediction.

The mathematical details are available in the associated file.

Data description

The input data contains surface temperature anomalies all over the world. Each data set is composed by 10 years of data with 22 climate model realizations and 3072 points over the all world. The corresponding 192 points over the all world corresponding to the 5 years average value following the 10 years of the same 22 climate models are also provided. This information should be used to evaluate the variance of the prediction. For each dataset the 10 years with 3072 points over the all world of the model to predict are also provided.

The two files train_X.csv and test_X.csv with respectively 5 and 2 data sets have the same format with 6 columns:

  • ID : Define a unique ID for each value.
  • DATASET: Define the dataset id. database stores several consistent dataset. Each dataset are independanta dataset is composed by:

    • 3072 temperature anomalies within the all world from 22 models (model id from 1 to 22) during 10 years
    • 3072 temperature anomalies within the all world from the observation (model id = 0)during 10 years.
    • the predicted 192 temparature anomalies within the all world for the 22 models
  • MODEL: model id (1-22) for models and (0) or the observation

  • TIME: id of the time (0-9) for the 10 year history and 10 for the predicted date.
  • POSITION: earth coordinate in healpix (nside=4 for prediction, nside=16 for history) the ordering is in nested
  • VALUE : the corresponding temperature anomalies.

The file train_Y.csv with 5 data sets stores the predicted data fr the observation. It has 4 columns:

  • ID : Define a unique ID for each value.
  • DATASET: the ID of the dataset (0-4) corresponding to the dataset ID of the train_X.csv file;
  • POSITION: Healpix pixel Id with nside=192 resolution. The use ordering is nested.
  • VALUE : the temperature anomalie.

The random_bench.csv file is an example of a candidate file. It should contain the predicted value and the corresponding variance of the 192 regions all over the world. This file has 5 colums:

  • ID : Define a unique ID for each value.
  • DATASET: the ID of the dataset (0-2) corresponding to the dataset ID of the test_X.csv file;
  • POSITION: Healpix pixel Id with nside=192 resolution. The use ordering is nested.
  • MEAN : the temperature anomalie prediction.
  • VARIANCE: the variance of the temperature prediction.

Two other python software helps to understand the challenge.

  • provides an example of prediction using the last known value for the mean and the 22 model distribution for the variance.

  • shows the 10 years map and the corresponding prediction for one model. This python script use the package healpy ( to describe the earth coordinate. The script is inside the metric_plot.pdf.

Benchmark description

The goal is to provide the mean and the variance of the 192 regional temperature anomalie predictions. The metric characterizes the accuracy using the mean and the consistency of the predicted error with the effective one.



Files are accessible when logged in and registered to the challenge

The challenge provider