Challenge Data

Solar forecasting using Copernicus radiation images
by MINES ParisTech, PSL Research University

For this challenge, the submitted files can be heavy (more than 100Mo). Processing your submission might take a few minutes:Β be patient!

Login to your account


Competitive challenge
Time series
More than 1GB
Advanced level


Started on Dec. 10, 2021

Challenge context

In the domains of Solar Energy and energy meteorology, there is a need for accurate intraday, named hereinafter short-term, solar forecasting. Indeed, short-term forecasts allow a better integration of photovoltaic (PV) systems by anticipating the variability of solar radiation in space and time. This is particularly important in electric systems with a high penetration of solar energy, where the dispatch of generation units to match electricity production and consumption at each time is particularly challenging. This need not only holds for large scale electric grid PV integration but also in the special case of off-the-grid electricity supply systems.

Geostationary satellites, notably thanks to algorithms such as Heliosat, are a source of spatially and temporally resolved of surface solar irradiance (SSI), the "fuel" of PV systems (unit: W/m2W/m^2 ). In the framework of Copernicus Atmospheric Monitoring Services (CAMS), the multispectral images acquired by Meteosat Second Generation (MSG) at the longitude 0Β° are used to provide, in near-real-time basis, every 15 min, images of SSI and SSI under clear-sky condition at 3 km resolution. These services, resp. CAMS Rad and CAMS McClear, are operated and maintained by Transvalor Innovation SoDA (, in collaboration with the DLR, the German Aerospace. This source of time series of SSI images is notably used to provide short-term (up to 2 hours) solar forecasting. The state-of-the-art of such satellite-based short-term solar forecasting is based on cloud motion vector (CMV) using optical flow or block-matching techniques.

Challenge goals

The aim of this challenge is to propose machine learning and deep learning approaches on sequences of images to provide better short-term forecast of future image of SSI on horizontal plan, noted GHI (Global Horizontal Irradiance), for time horizon ranging from 15 minutes to 1 hour, with a time resolution of 15 min and a spatial resolution of 3 km.

More precisely, we are interested in a square region of interest (RoI) of size 51 pixels x 51 pixels (approx. 150Β km). With an assumption of max cloud speed of 10+m/s10+ m/s , and considering solar forecasting up to 1 hour ahead, the observation region (OR) encompassing the RoI have a size of 81 pixels x 81 pixels (approx. 240 km).

At a given time tt , one hour after the sunrise and one hour before the sunset, considering the sequence of the 4 previous GHI\text{GHI} images on the OR every 15 min, the solar forecasting aims at predicting the GHI\text{GHI} images on the RoI for the next times ahead, ranging from the next 15 min up to the next hour with a time step of 15 min. This forecast of GHI\text{GHI} for the location pp , done at the time tt for the future time t+Ξ”tt + \Delta t is noted: GHI^(p,Β t+Ξ”t∣t)\widehat{\text{GHI}}(p,\ t + \Delta t|t)

The learning phase is done on one year of data and the test phase is done on a separate year.

In this challenge, we will only consider the cloud effects on GHI\text{GHI} , assuming that the concomitant and collocated GHI\text{GHI} under clear-sky condition (with no cloud) is perfectly known and noted GHIcls\text{GHI}_{cls} .

Contextual information of interest are the corresponding solar zenith angles (SZA) ΞΈS\theta_{S} , solar azimuth angle (SAA) Ξ±S\alpha_{S} .

Do not hesitate to refer to the full Copernicus documentation in the supplementary files, as several technical aspects of the challenge are further explained and detailed.

Data description

The training set contains 1845 samples, and the test set contains 1841 samples. Each sample represents a time tt at which we consider the previous images and wish to predict the next images.

Data format

1. Input

Practically, the input XX is encoded in the numpy .npz format and consists of:

  • datetime the time tt at which we consider the 4 previous GHI\text{GHI} images on the OR every 15 min. This vector of length nsamplesn_{samples} is of datetime type (YYYYMMDDHHMM).

  • GHI a matrix of size (nsamplesn_{samples} , 81, 81, 4) with the sequence of the 4 previous 15-min GHI\text{GHI} images (of size (81,81)) on the OR for the times tt -45min, tt -30min, tt -15min, and tt .

  • CLS a matrix of size (nsamplesn_{samples} , 81, 81, 8) with the sequence of the 4 previous and 4 next modelled 15-min clear-sky (i.e. with no clouds) GHI\text{GHI} images (of size (81,81)) on the OR, noted GHIcls\text{GHI}_{cls} , for the times tt -45min, tt -30min, tt -15min, tt , tt +15min, tt +30min, tt +45min, tt +60min.

  • SZA a matrix of size (nsamplesn_{samples} , 81, 81, 8) with the sequence of the 4 previous and 4 next modelled 15-min SZA (of size (81,81)) on the OR for the times tt -45min, tt -30min, tt -15min, tt , tt +15min, tt +30min, tt +45min, tt +60min.

  • SAA a matrix of size (nsamplesn_{samples} , 81, 81, 8) with the sequence of the 4 previous and 4 next modelled 15-min SAA (of size (81,81)) on the OR for the times tt -45min, tt -30min, tt -15min, tt , tt +15min, tt +30min, tt +45min, tt +60min.

Note that nsamplesn_{samples} = 1845 for the training set and nsamplesn_{samples} = 1841 for the testing set.

To load and read the contents of a .npz file, one can use the following:

# Load the .npz file
X = np.load('filename.npz', allow_pickle=True)

# Display the contents of the .npz file

# Access the contents of the .npz file
date = X['datetime']
GHI = X['GHI']
CLS = X['CLS']
SZA = X['SZA']
SSA = X['SSA']

Note that nsamplesn_{samples} = 1845 for the training set and nsamplesn_{samples} = 1841 for the testing set.

2. Output

2.1 4D format

The output vector yy represents the sequence of the 4 next 15-min GHI\text{GHI} images on the RoI, corresponding to a matrix of size (nsamplesn_{samples} , 51,51,4), for the 4 future times tt +15min, tt +30min, tt +45min and tt +60min.

2.2 2D format

In this challenge, we will be providing the raw 2D format of the output vector $y$ which is a dataframe of size (nsamplesn_{samples} , 4x51x51x4+1) = (nsamplesn_{samples} , 10405), where the first colum of the dataframe is id_sequence (the ids of the considered time sequence tt ).

2.3 From 2D to 4D output format

In order to transform the raw 2D output to a 4D matrix format (which will be useful for displaying the various images of the output vector yy ), it is necessary to:

  • First, remove the id_sequence column.

  • Second, use the following transformation:

y_4D = np.transpose(np.reshape(np.array(y_raw),(-1,4,51,51)), (0, 1, 3, 2))

2.4 From 4D to 2D output format

In order to transform the 4D output to a 2D raw format (which will be compulsory when submitting your model predictions), it is necessary to:

  • Use the following transformation
y_2D = np.transpose(y_4D, (0,1,3,2)).reshape(-1, 10404)
  • Transform the array to a dataframe.

  • Add an index column id_sequence.

These transformations are already implemented in the (cf. supplementary files).


The OR and the RoI are concentric: with python-like index :

RoI = OR[15:66,15:66]

Benchmark description

Two simple forecasts methods will be provided for the benchmark:

  • The persistence forecasting PP :

    GHIP(p,t+Ξ”t∣t)^=GHI(p,t)(GHIcls(p,t+Ξ”t)GHIcls(p,t))\widehat{\text{GHI}_{P}\left( p,t + \Delta t|t \right)} = GHI(p,t)\left( \frac{\text{GHI}_{\text{cls}}(p,t + \Delta t)}{\text{GHI}_{\text{cls}}(p,t)} \right)

    This method of forecasting is used as a baseline, to compute the skill-score (SC).

  • The CMVCMV forecasting which is based on a state-of-the-art optical flow and CMV persistence.

We will be providing the persistence forecasting PP benchmark for the test set while the CMVCMV forecasting benchmark, for the test set as well, will be added as supplementary data.

The candidate is free to choose either one of these forecasting methods to benchmark his model.


Files are accessible when logged in and registered to the challenge

The challenge provider


Research on satellite-based surface solar irradiance forecasting

Congratulation for the winners of the challenge

1 JΓ©rΓ΄me Gaveau
1 Benjamin Duguet
2 Jacques de Chevron Villette
3 FredZ & FlorentinP

You can find the whole list of winners of the season here