Challenge Data

Precipitation Nowcasting
by PlumeLabs


Login to your account


Description


NO LOGO FOR THIS CHALLENGE
Competitive challenge
Physics
Environment
Regression
Images
Time series
More than 1GB
Advanced level

Dates

Started on Jan. 6, 2023


Challenge context

Nowcasting is weather forecasting on a short term period of up to 6 hours. This forecast is an extrapolation in time of measured weather parameters using techniques that take into account a possible evolution of the air mass. This type of forecast includes details that are not solved by numerical weather prediction models running over longer forecast periods.
This challenge focuses on precipitation nowcasting, which is key for many industries (agriculture, insurance, construction, …). Weather radar echoes are particularly important in nowcasting because they are very detailed and pick out the size, shape, intensity, speed and direction of movement of individual features of weather on a continuous basis with a good resolution.

State-of-the-art operational nowcasting methods typically advect precipitation fields with radar-based wind estimates using optical flow methods. Recently introduced deep learning methods use radar to directly predict future rain rates, free of physical constraints. While they accurately predict low-intensity rainfall, their operational utility is limited because their lack of constraints produces blurry nowcasts at longer lead times, yielding poor performance on rarer medium-to-heavy rain events.


Challenge goals

The goal of this challenge is to predict future precipitation rate fields (estimated using radar echoes) given the past precipitation rate fields.
The precipitation rate is the volume of water falling per unit area, the classic unit is the mm (Litre / m²).
The dataset includes a wide variety of precipitation scenarios and will not be representative of natural frequencies of events.

The quality of the predictions will be judged according to the Mean Squared Logarithmic Error metric.


Data description

Input:

Radar-based historical precipitation rate: the input vectors x are the last four precipitation rate fields in a square of about 100 kilometers. The period between two fields is 6 minutes and the spatial resolution is 0.01 degrees. In this way, an input sample will have the following dimension:
(Ntin,Nyin,Nxin)=(4,128,128)(Nt_{in}, Ny_{in}, Nx_{in}) = (4, 128, 128)

Output:

Radar-based future precipitation rate: the output vectors y are the next eight spatially averaged precipitation rate fields in a square of about 2 kilometers centered in the middle of the square of the corresponding input vector. So we took the future precipitation rates centered in the middle of the historical precipitation rates. Their dimension was (Ntout,Nyout,Nxout)=(8,2,2)(Nt_{out}, Ny_{out}, Nx_{out}) = (8, 2, 2) and then we averaged according to the two dimensions of the space and so the final dimension is Ntout=8Nt_{out} = 8 The period between two fields is 6 minutes and the spatial resolution is 0.01 degrees.
Each entry in these vectors correspond to the precipitation rate (in mm/h times a factor equal to 10012 \frac{100}{12} ).

Data format and content:

Input:
the x_train (or x_test) dataset will be provided as a compressed folder containing a collection of .npz files. It is possible to load these files in the following way:

import numpy as np
sample = np.load('file.npz')

Thus loaded, the data contained is as follows:

sample = {
    'data': precipitation_rate,
    'target_ids': target_ids
}
The data field is a numpy array with the dimension (Ntin,Nyin,Nxin)(Nt_{in}, Ny_{in}, Nx_{in}) and the target_ids field is also a numpy array with a single dimension of size NtoutNt_{out}. This last field will allow to associate the predictions made with the data field with the ids of the y_train or y_test files.

Output:
y_train file will be provided as a csv file containing only two columns:

  • 'ID' corresponds to the id of the time step
  • 'TARGET' corresponds to the precipitation rate of the time step.


Benchmark description

Persistence:

We introduce an extremely simple benchmark, it consists in predicting a constant sequence in time and equal to the last available radar observation.

This benchmark is easy to beat and will help to identify possible errors of the participants.

Here is an implementation of this benchmark:

import os
import numpy as np
import pandas as pd


def benchmark(x_test_dir):
    n_t_out, out_size = 8, 2
    in_size = 128
    crop = (in_size - out_size) // 2
    benchmark_prediction = []
    benchmark_ids = []
    for file in os.listdir(x_test_dir):
        x_test = np.load(f'{x_test_dir}/{file}')
        y_bench = np.concatenate([
            x_test['data'][-1:, crop:-crop, crop:-crop] 
            for _ in range(n_t_out)
        ])
        benchmark_prediction.append(y_bench.mean(axis=(1, 2)))
        benchmark_ids.append(x_test['target_ids'])
    return pd.DataFrame({
        'ID': np.concatenate(benchmark_ids), 
        'TARGET': np.concatenate(benchmark_prediction)
    })

Lagrangian Persistence:

More advanced benchmarks can be easily performed using the pySTEPS library. This library allows among other things to make extrapolation using a flow field estimated by optical flow methods. The proposed benchmark is in some ways the simplest within this type of approach. It consists in estimating the velocity field using the Lucas Kanade method and then extrapolating the last precipitation rate field using a semi-lagrangian scheme.

Here is an implementation of this benchmark:

import os
import numpy as np
import pandas as pd

from pysteps.motion import get_method
from pysteps.nowcasts import extrapolation


def benchmark_pysteps(x_test_dir):
    n_t_out, out_size = 8, 2
    in_size = 128
    crop = (in_size - out_size) // 2
    benchmark_prediction = []
    benchmark_ids = []
    pysteps_flow_method = get_method('LK')
    for file in os.listdir(x_test_dir):
        x_test = np.load(f'{x_test_dir}/{file}')
        motion_field = pysteps_flow_method(x_test['data'])
        predictions = extrapolation.forecast(
            x_test['data'][-1, ...], 
            motion_field, 
            n_t_out
        )
        predictions[np.isnan(predictions)] = 0.
        benchmark_prediction.append(
            predictions[:, crop:-crop, crop:-crop].mean(axis=(1, 2))
        )
        benchmark_ids.append(x_test['target_ids'])
    return pd.DataFrame({
        'ID': np.concatenate(benchmark_ids), 
        'TARGET': np.concatenate(benchmark_prediction)
    })

This benchmark is the one used for the ranking panel.


Files


Files are accessible when logged in and registered to the challenge


The challenge provider


PROVIDER LOGO

Plume Labs is a French technology company helping individuals avoid air pollution through 3 products: A mobile app, Plume Air Report, which provides real-time and forecast air quality levels around the world. Flow, our personal air quality tracker that senses pollutants around you. A global atmospheric pollution API, which gives businesses and academic teams an access to live air quality data and forecasts. Our products are powered by our unique atmospheric data platform, based on state-of-the-art geospatial AI models trained on terabytes of data and applied in real-time to the latest available measurements. We are now part of AccuWeather, recognized as the most accurate source of weather forecasts and warnings. With global headquarters in State College, Pennsylvania; a severe weather center in Wichita, Kansas; and offices in New York City and elsewhere around the world, AccuWeather serves more than 1.5 billion people daily to provide them with actionable information about the weather. AccuWeather also helps businesses assess and manage the risks they face related to weather, and in particular with climate change.


Congratulation for the winners of the challenge


1 Franck Zibi
2 Waldemar Schulgin
3 Harry Pommier

You can find the whole list of winners of the season here