Veolia group is the global leader in optimized resources management. With nearly 169 000 employees worldwide, the Group designs and provides water, waste and energy management solutions that contribute to the sustainable development of communities and industries. Through its three complementary business activities, Veolia helps to develop access to resources, preserve available resources, and to replenish them.
Veolia's objective is to provide a technical and objective response to perceptions of odor nuisance around certain wastewater and waste treatment sites. SO2 is a colorless gas with a pungent odor and poisonous, the inhalation of which is strongly irritating. It is released into the Earth's atmosphere by volcanoes and by many industrial processes.
A smart prediction of odor compound concentration can improve industrial processes to avoid causing odor nuisance around industrial sites.
Can you predict the concentration of Sulfur dioxide (SO2) at one location from a network of sensors?
Using measurement data from ATMO Normandie sensor network, weather data, and land use data from Copernicus Corine Land Cover (CLC), the goal is to do Multivariate Time Series Forecasting and predict the SO2 hourly concentration in μg / m³ corresponding to the next 12 hours at the Le Havre, MAS station from the last 48 hours.
Data description
The dataset contains hourly average concentrations from the fixed network of the main regulated pollutants in the air in the Normandy region, including sulfur dioxide SO2. All data provided are in μg / m³ (microgram per cubic meter). It also contains hourly values for weather data such as surface temperature, wind speed, wind direction, relative humidity, atmospheric pressure, dew point, and precipitation rate. Finally it contains the land cover class that is an indicator on the ability of a pollutant plume to be more or less dispersed due to the occupation of the land.
The total volume of data corresponds to a year of historical data. The file is decomposed into a training dataset and a test dataset and each dataset contains input and output variables. Each sample in the training and test sets corresponds to 48 hour observations, each column corresponds to a sensor value at a given hour. One input xi​ is described as follows:
ID : row ID
weekday-i : 1<=i<=48, weekday (monday =1, ... , sunday =7) at previous i hour
hour-i : 1<=i<=48, hour at previous i hour
SO2_HRI-i : 1<=i<=48, SO2 measurement at the HRI station in micrograms per cubic meter at previous i hour
SO2_HVH-i : 1<=i<=48, SO2 measurement at the HVH station in micrograms per cubic meter at previous i hour
SO2_STA-i : 1<=i<=48, SO2 measurement at the STA station in micrograms per cubic meter at previous i hour
SO2_CAU-i : 1<=i<=48, SO2 measurement at the CAU station in micrograms per cubic meter at previous i hour
SO2_GOR-i : 1<=i<=48, SO2 measurement at the GOR station in micrograms per cubic meter at previous i hour
SO2_HAR-i : 1<=i<=48, SO2 measurement at the HAR station in micrograms per cubic meter at previous i hour
x_wgs84_HRI-i : 1<=i<=48, X coordinate of the station HRI in the World Geodetic System (WGS) format at previous i hour
x_wgs84_HVH-i : 1<=i<=48, X coordinate of the station HVH in the World Geodetic System (WGS) format at previous i hour
x_wgs84_MAS-i : 1<=i<=48, X coordinate of the station MAS in the World Geodetic System (WGS) format at previous i hour
x_wgs84_STA-i : 1<=i<=48, X coordinate of the station STA in the World Geodetic System (WGS) format at previous i hour
x_wgs84_CAU-i : 1<=i<=48, X coordinate of the station CAU in the World Geodetic System (WGS) format at previous i hour
x_wgs84_GOR-i : 1<=i<=48, X coordinate of the station GOT in the World Geodetic System (WGS) format at previous i hour
x_wgs84_HAR-i : 1<=i<=48, X coordinate of the station HAR in the World Geodetic System (WGS) format at previous i hour
y_wgs84_HRI-i : 1<=i<=48, Y coordinate of the station HRI in the World Geodetic System (WGS) format at previous ii hour
y_wgs84_HVH-i : 1<=i<=48, Y coordinate of the station HVH in the World Geodetic System (WGS) format at previous i hour
y_wgs84_MAS-i : 1<=i<=48, Y coordinate of the station MAS in the World Geodetic System (WGS) format at previous i hour
y_wgs84_STA-i : 1<=i<=48, Y coordinate of the station STA in the World Geodetic System (WGS) format at previous i hour
y_wgs84_CAU-i : 1<=i<=48, Y coordinate of the station CAU in the World Geodetic System (WGS) format at previous i hour
y_wgs84_GOR-i : 1<=i<=48, Y coordinate of the station GOR in the World Geodetic System (WGS) format at previous i hour
y_wgs84_HAR-i : 1<=i<=48, Y coordinate of the station HAR in the World Geodetic System (WGS) format at previous i hour
surfaceTemperatureCelsius-i : 1<=i<=48, Temperature in Celcius degrees at previous i hour
surfaceDewpointTemperatureCelsius-i : 1<=i<=48, Dewpoint temperature in Celcius degrees at previous i hour
relativeHumidityPercent-i : 1<=i<=48, relative humidity in % at previous i hour
surfaceAirPressureKilopascals-i : 1<=i<=48, Pressure in Kilopascals at previous i hour
windSpeedKph-i : 1<=i<=48, Windspeed in kilometers per hour at previous i hour
windDirectionDegrees-i : 1<=i<=48, Wind direction in degrees at previous i hour. 0° is a wind blowing from the north.
cloudCoveragePercent-i : 1<=i<=48, Cloud coverage in % at previous i hour
precipitationPreviousHourCentimeters-i: 1<=i<=48, Precipitation in centimiters at previous i hour
directNormalIrradianceWsqm-i : 1<=i<=48, Direct normal solar irradiance watt per square meter at previous i hour
downwardSolarRadiationWsqm-i : 1<=i<=48, Downward solar irradiance watt per square meter at previous i hour
diffuseHorizontalRadiationWsqm-i : 1<=i<=48, Diffuse horizontal irradiance (amount of radiation received) in watt per square meter at previous i hour
windChillTemperatureCelsius-i : 1<=i<=48, Wind chill temperature in Celcius degrees at previous i hour
apparentTemperatureCelsius-i : 1<=i<=48, Apparent temperature in Celcius degress at previous i hour
snowfallCentimeters-i : 1<=i<=48, Snow fall in centimeters at previous i hour
surfaceWindGustsKph-i : 1<=i<=48, Surface wind gust in kilometers per hour at previous i hour
land_cover_class_HVC-i : 1<=i<=48, Land cover class around station HVC at previous i hour
land_cover_class_HAR-i : 1<=i<=48, Land cover class around station HAR at previous i hour
land_cover_class_CAU-i : 1<=i<=48, Land cover class around station CAU at previous i hour
land_cover_class_MAS-i : 1<=i<=48, Land cover class around station MAS at previous i hour
land_cover_class_GOR-i : 1<=i<=48, Land cover class around station GOR at previous i hour
land_cover_class_HRI-i : 1<=i<=48, Land cover class around station HRI at previous i hour
The output file contains the 12 hour times series to be predicted hourly from the input. These corresponds to the predictions on the SO2 measured over time at the target station. The output file is defined as follows. For each Id of the input dataset, the same Id of the output data set contains the following quantities yi​:
ID : row ID
SO2_MAS+i : 0<=i<=11, SO2 measurement at the MAS station at i hour ahead in micrograms per cubic meter
The input test dataset will have the following form: 48 columns for each feature time series: