From Jan. 1, 2019 to Jan. 1, 2020
Planete OUI has to offer green electricity supply with prices adapted to the consumption profiles of its clients. The electricity is indeed bough on the electricity markets, where prices are highly variable during a day. A site with a high share of consumption when prices are low, for example at mid-afternoon, will be supplied at lower costs than a site consuming electricity mainly during peak price intervals. The consumption profile of an installation has then to be appraised to compute the best estimation of supply tariffs, allowing Planète OUI to meet its running costs while supplying its client at the best price. Sites with a metering power larger than 250 kVA are subject to a precise analysis methodology. Most sites are characterized by a consumption varying strongly with temperature because of electrical heating systems. French national electricity consumption is the most temperature-sensitive in Europe, increasing by 2.4 GW per lost degree – the output of 2 or 3 nuclear reactors. Nonetheless, industrial installations are specific cases, because their consumption might be highly related to non-thermosensitive uses, for instance chemical or metallurgical processes. For each site, the objective is namely to analyze thermosensitivity uses. This, however, is not enough to determine precisely the consumption profiles, other factors affecting consumption, for example annual, weekly and daily seasonalities. When Planète OUI prepares electricity supply offer, it receives from the potential client historical consumption data. These profiles are combined with electricity prices simulations to compute a distribution of supply costs in €/MWh. A given percentile is then used to cover supply costs for a wide range of price scenarios. However, the client’s data is often incomplete and spread over a relatively short period which is rarely longer than a year. In order to get a more precise supply pricing, the goal of Planète OUI is to develop a machine learning model where one or several years of extrapolated consumption data are rebuilt from a single year of measured data supplied by the client. These extrapolated profiles will be combined with electricity prices as well, in order to get a larger data set of analysis.
The goal of the challenge is to predict, based on the analysis of the correlation of a year of consumption and weather training data, the electricity consumption of two given sites for a test year. In operational conditions, the new consumption profiles would be integrated to electricity supply pricing analysis.
The entry data of the model to be developed is composed of the following columns; data being recorded on an hourly basis:
- "ID": Data point ID;
- "timestamp": Complete timestamps with year, month, day and hour, in local time (CET and CEST);
- "temp1", "temp2", "meannationaltemp": Local and mean national temperatures (°C);
- "humidity1", "humidity2": Local relative humidities (%);
- "loc1", "loc2" "locsecondary1", "locsecondary2", "locsecondary3": the coordinates of the studied and secondary sites, in decimal degrees and of the form (latitude, longitude).
- "consumptionsecondary1", "consumptionsecondary2", "consumptionsecondary3": the consumption data of three secondary sites, whose correlations with studied sites may be of use (kWh). Indeed, the two studied sites and the three secondary sites are used for the same purposes; The output data of the model to be developed takes the following form:
- "ID": Data point ID;
- "consumption1", "consumption2": the consumption data of the two studied sites (kWh).
Relative humidities are provided with temperature data because they represent variables of importance for electricity consumption: humidity indeed strongly influences thermal comfort.
To replicate operational conditions, some temperature and humidity data points will be missing. The imputation method must be carefully considered.
The "consumptionsecondaryi" variables are the consumption data of several sites with metering power higher than 250 kVA of the Planète OUIs portfolio. This c
orrelation of the various sites consumptions shall be studied to precise data completion or interpolation.
Timestamps may be expressed as month or day of year, day of week and hours, to study the impact of annual, weekly and daily seasonalities. Particular attention should be paid to national holidays processing.
Persistency is a fast and relatively accurate benchmark. It consists in estimate that the hourly consumption of the y+1 year is equal to that of the y one, shifting data relative to weekends and public holidays.For instance, the benchmark method would consider the hourly consumption of Saturday, August 4, 2018 is equal to that of Saturday, August 5, 2017.