Challenge Data

Predicting odor compound concentrations
by VEOLIA


Login to your account


Description


NO LOGO FOR THIS CHALLENGE
Competitive challenge
Physics
Environment
Health
Regression
Spatio-temporal series
10MB to 1GB
Intermediary level

Dates

Started on Jan. 5, 2022


Challenge context

Veolia group is the global leader in optimized resources management. With nearly 169 000 employees worldwide, the Group designs and provides water, waste and energy management solutions that contribute to the sustainable development of communities and industries. Through its three complementary business activities, Veolia helps to develop access to resources, preserve available resources, and to replenish them.

Veolia's objective is to provide a technical and objective response to perceptions of odor nuisance around certain wastewater and waste treatment sites. SO2 is a colorless gas with a pungent odor and poisonous, the inhalation of which is strongly irritating. It is released into the Earth's atmosphere by volcanoes and by many industrial processes.

A smart prediction of odor compound concentration can improve industrial processes to avoid causing odor nuisance around industrial sites.

Contacts:

  • Anne-Sophie Guilbert
  • Yannick Deleuze
  • email: fr.veri.challenge-ens.int.groups@veolia.com


Challenge goals

Can you predict the concentration of Sulfur dioxide (SO2) at one location from a network of sensors?

Using measurement data from ATMO Normandie sensor network, weather data, and land use data from Copernicus Corine Land Cover (CLC), the goal is to do Multivariate Time Series Forecasting and predict the SO2 hourly concentration in μg / m³ corresponding to the next 12 hours at the Le Havre, MAS station from the last 48 hours.


Data description

The dataset contains hourly average concentrations from the fixed network of the main regulated pollutants in the air in the Normandy region, including sulfur dioxide SO2. All data provided are in μg / m³ (microgram per cubic meter). It also contains hourly values for weather data such as surface temperature, wind speed, wind direction, relative humidity, atmospheric pressure, dew point, and precipitation rate. Finally it contains the land cover class that is an indicator on the ability of a pollutant plume to be more or less dispersed due to the occupation of the land.

The total volume of data corresponds to a year of historical data. The file is decomposed into a training dataset and a test dataset and each dataset contains input and output variables. Each sample in the training and test sets corresponds to 4848 hour observations, each column corresponds to a sensor value at a given hour. One input xix_i is described as follows:

  • ID : row ID
  • weekday-ii : 1<=i<=481 <= i <= 48, weekday (monday =1, ... , sunday =7) at previous ii hour
  • hour-ii : 1<=i<=481 <= i <= 48, hour at previous i hour
  • SO2_HRI-ii : 1<=i<=481 <= i <= 48, SO2 measurement at the HRI station in micrograms per cubic meter at previous ii hour
  • SO2_HVH-ii : 1<=i<=481 <= i <= 48, SO2 measurement at the HVH station in micrograms per cubic meter at previous ii hour
  • SO2_STA-ii : 1<=i<=481 <= i <= 48, SO2 measurement at the STA station in micrograms per cubic meter at previous ii hour
  • SO2_CAU-ii : 1<=i<=481 <= i <= 48, SO2 measurement at the CAU station in micrograms per cubic meter at previous ii hour
  • SO2_GOR-ii : 1<=i<=481 <= i <= 48, SO2 measurement at the GOR station in micrograms per cubic meter at previous ii hour
  • SO2_HAR-ii : 1<=i<=481 <= i <= 48, SO2 measurement at the HAR station in micrograms per cubic meter at previous ii hour
  • x_wgs84_HRI-ii : 1<=i<=481 <= i <= 48, X coordinate of the station HRI in the World Geodetic System (WGS) format at previous ii hour
  • x_wgs84_HVH-ii : 1<=i<=481 <= i <= 48, X coordinate of the station HVH in the World Geodetic System (WGS) format at previous ii hour
  • x_wgs84_MAS-ii : 1<=i<=481 <= i <= 48, X coordinate of the station MAS in the World Geodetic System (WGS) format at previous ii hour
  • x_wgs84_STA-ii : 1<=i<=481 <= i <= 48, X coordinate of the station STA in the World Geodetic System (WGS) format at previous ii hour
  • x_wgs84_CAU-ii : 1<=i<=481 <= i <= 48, X coordinate of the station CAU in the World Geodetic System (WGS) format at previous ii hour
  • x_wgs84_GOR-ii : 1<=i<=481 <= i <= 48, X coordinate of the station GOT in the World Geodetic System (WGS) format at previous ii hour
  • x_wgs84_HAR-ii : 1<=i<=481 <= i <= 48, X coordinate of the station HAR in the World Geodetic System (WGS) format at previous ii hour
  • y_wgs84_HRI-ii : 1<=i<=481 <= i <= 48, Y coordinate of the station HRI in the World Geodetic System (WGS) format at previous iii hour
  • y_wgs84_HVH-ii : 1<=i<=481 <= i <= 48, Y coordinate of the station HVH in the World Geodetic System (WGS) format at previous ii hour
  • y_wgs84_MAS-ii : 1<=i<=481 <= i <= 48, Y coordinate of the station MAS in the World Geodetic System (WGS) format at previous ii hour
  • y_wgs84_STA-ii : 1<=i<=481 <= i <= 48, Y coordinate of the station STA in the World Geodetic System (WGS) format at previous ii hour
  • y_wgs84_CAU-ii : 1<=i<=481 <= i <= 48, Y coordinate of the station CAU in the World Geodetic System (WGS) format at previous ii hour
  • y_wgs84_GOR-ii : 1<=i<=481 <= i <= 48, Y coordinate of the station GOR in the World Geodetic System (WGS) format at previous ii hour
  • y_wgs84_HAR-ii : 1<=i<=481 <= i <= 48, Y coordinate of the station HAR in the World Geodetic System (WGS) format at previous ii hour
  • surfaceTemperatureCelsius-ii : 1<=i<=481 <= i <= 48, Temperature in Celcius degrees at previous ii hour
  • surfaceDewpointTemperatureCelsius-ii : 1<=i<=481 <= i <= 48, Dewpoint temperature in Celcius degrees at previous ii hour
  • relativeHumidityPercent-ii : 1<=i<=481 <= i <= 48, relative humidity in % at previous ii hour
  • surfaceAirPressureKilopascals-ii : 1<=i<=481 <= i <= 48, Pressure in Kilopascals at previous ii hour
  • windSpeedKph-ii : 1<=i<=481 <= i <= 48, Windspeed in kilometers per hour at previous ii hour
  • windDirectionDegrees-ii : 1<=i<=481 <= i <= 48, Wind direction in degrees at previous ii hour. 0° is a wind blowing from the north.
  • cloudCoveragePercent-ii : 1<=i<=481 <= i <= 48, Cloud coverage in % at previous ii hour
  • precipitationPreviousHourCentimeters-i: 1<=i<=481 <= i <= 48, Precipitation in centimiters at previous i hour
  • directNormalIrradianceWsqm-ii : 1<=i<=481 <= i <= 48, Direct normal solar irradiance watt per square meter at previous ii hour
  • downwardSolarRadiationWsqm-ii : 1<=i<=481 <= i <= 48, Downward solar irradiance watt per square meter at previous ii hour
  • diffuseHorizontalRadiationWsqm-ii : 1<=i<=481 <= i <= 48, Diffuse horizontal irradiance (amount of radiation received) in watt per square meter at previous ii hour
  • windChillTemperatureCelsius-ii : 1<=i<=481 <= i <= 48, Wind chill temperature in Celcius degrees at previous ii hour
  • apparentTemperatureCelsius-ii : 1<=i<=481 <= i <= 48, Apparent temperature in Celcius degress at previous ii hour
  • snowfallCentimeters-ii : 1<=i<=481 <= i <= 48, Snow fall in centimeters at previous ii hour
  • surfaceWindGustsKph-ii : 1<=i<=481 <= i <= 48, Surface wind gust in kilometers per hour at previous ii hour
  • land_cover_class_HVC-ii : 1<=i<=481 <= i <= 48, Land cover class around station HVC at previous ii hour
  • land_cover_class_HAR-ii : 1<=i<=481 <= i <= 48, Land cover class around station HAR at previous ii hour
  • land_cover_class_CAU-ii : 1<=i<=481 <= i <= 48, Land cover class around station CAU at previous ii hour
  • land_cover_class_MAS-ii : 1<=i<=481 <= i <= 48, Land cover class around station MAS at previous ii hour
  • land_cover_class_GOR-ii : 1<=i<=481 <= i <= 48, Land cover class around station GOR at previous ii hour
  • land_cover_class_HRI-ii : 1<=i<=481 <= i <= 48, Land cover class around station HRI at previous ii hour

The output file contains the 1212 hour times series to be predicted hourly from the input. These corresponds to the predictions on the SO2 measured over time at the target station. The output file is defined as follows. For each Id of the input dataset, the same Id of the output data set contains the following quantities yiy_i:

  • ID : row ID
  • SO2_MAS+ii : 0<=i<=110 <= i <= 11, SO2 measurement at the MAS station at ii hour ahead in micrograms per cubic meter

The input test dataset will have the following form: 48 columns for each feature time series:

ID, feature1-48, ..., feature1-1, feature2-48,..., feature2-1, .... featureN-48, ..., featureN-1
1,...
2,...
3,...
...

The ouput test data will have the same ID correspondance with each rows corresponding to the 1212 hours to predict:

ID, SO2_MAS+0, SO2_MAS+1, ..., SO2_MAS+11 
1,...
2,...
3,...
...


Benchmark description

Evaluation metric

The metric used is the MSE (Mean Squared Error).

Benchmark

The benchmark was obtained using a LSTM network with dropout using Keras with loss 'mse' and Adam optimizer with learning rate lr=1e-3 and

epochs = 100
batch_size = 512
nb_LSTM_layers = 1
nb_units = 30
dropout_rate = 0.2


Files


Files are accessible when logged in and registered to the challenge


The challenge provider


PROVIDER LOGO

Veolia Research and Innovation


Congratulation for the winners of the challenge


1 Maurice Tia
2 graillou
3 Christophe Leroux

You can find the whole list of winners of the season here