Climate
Because of an error in the true prediction file (y_test file) this challenge will be removed at the end of June and the winners won't be rewarded. It will be proposed again next year with appropriate data.
Started on Jan. 4, 2021
The prediction of temperature anomalies on interannual timescales (1 or 5 years) is one of the most challenging topic of climate science. Recently a study has shown in that the earth temperature anomaly when globally average has a certain predictability on 1 to 5 years in the future. The aim of this study is to demonstrate such skillful predictability but at the regional scale.
However in climate science, because of the inherent chaotic nature of the studied system, prediction has to be made in a probabilistic framework. Indeed the likelihood of extreme events is potentially as important as the knowledge of the expected event, for risk assessment and early mitigation. Hence a skillful prediction has to be accurate (minimal error in prediction) and reliable (good sampling of prediction spread).
Another challenge of climate science is the lack or the limited amount of data, to train any prediction system. Thus, this challenge aims to find an algorithm that predicts the temperature anomalies of the next years using the past 10 years of data and provide a probabilistic prediction, or at least the expected prediction together with its uncertainty (within a Gaussian assumption).
Ref: SÃ©vellec, F., & Drijfhout, S. S. (2018). A novel probabilistic forecast system predicting anomalously warm 2018-2022 reinforcing the long-term global warming trend. Nature Communications, 9(1), [3024]. DOI: 10.1038/s41467-018-05442-8
To estimate the validity of the predictions we propose to use two different measures: the coefficient of determination (R2), which shows the skill of the mean prediction; and the reliability, which measures the accuracy of the spread in the prediction.
The mathematical details are available in the associated file.
The input data contains surface temperature anomalies all over the world. Each data set is composed by 10 years of data with 22 climate model realizations and 3072 points over the all world. The corresponding 192 points over the all world corresponding to the 5 years average value following the 10 years of the same 22 climate models are also provided. This information should be used to evaluate the variance of the prediction. For each dataset the 10 years with 3072 points over the all world of the model to predict are also provided.
The two files train_X.csv and test_X.csv with respectively 5 and 2 data sets have the same format with 6 columns:
DATASET: Define the dataset id. database stores several consistent dataset. Each dataset are independanta dataset is composed by:
MODEL: model id (1-22) for models and (0) or the observation
The file train_Y.csv with 5 data sets stores the predicted data fr the observation. It has 4 columns:
The random_bench.csv file is an example of a candidate file. It should contain the predicted value and the corresponding variance of the 192 regions all over the world. This file has 5 colums:
Two other python software helps to understand the challenge.
climate_example.py provides an example of prediction using the last known value for the mean and the 22 model distribution for the variance.
show_model.py shows the 10 years map and the corresponding prediction for one model. This python script use the package healpy (https://healpy.readthedocs.io/en/latest/) to describe the earth coordinate. The script is inside the metric_plot.pdf.
The goal is to provide the mean and the variance of the 192 regional temperature anomalie predictions. The metric characterizes the accuracy using the mean and the consistency of the predicted error with the effective one.
Files are accessible when logged in and registered to the challenge