Professor
Started on Jan. 4, 2021
Created in 2006, Oze-Energies is an innovative company specialized in Instrumented Energy Optimization of existing commercial buildings. Thousands of communicating sensors, coupled with monitoring and energy optimization softwares, allow to measure and store a huge number of data (temperatures, consumptions, programming, etc...) in real time and continuously. Using data accumulated for a few weeks and its statistical learning algorithms, Oze-Energies models the energy behavior of each building. Oze-Energies experts then identify and evaluate progress actions for equal comfort and without work, acting partly on the settings of climatic equipment (heating, ventilation and air conditioning) and secondly resizing energy contracts. These actions reduce the energy bill of the owners and tenants of about 25% on average per year.
Maurice Charbit (mch@oze-energies.com)
Max Cohen (max.cohen@telecom-sudparis.eu )
Sylvain Le Corff (sylvain.lecorff@gmail.com)
This data challenge aims at introducing a new statistical model to predict and analyze air quality in big buildings using observations stored in the Oze-Energies database. Physics based approaches to build air quality simulation tool in order to simulate complex building behaviors are widespread in the most complex situations. The main drawbacks of such softwares to simulate the behavior of transient systems are:
In order to analyze and predict future air quality to alert and correct building management systems to ensure comfort and satisfactory sanitary conditions, this challenge aims at solving issue ii), i.e. at designing models which takes into account the uncertainty in the exogenous data describing external weather conditions and the occupation of the building. This will allow to provide confidence intervals on the air quality predictions, here on the humidity of the air inside the building.
The file is decomposed into a training dataset and a test dataset and each dataset contains input and output variables. Each sample in the training and test sets corresponds to one week of hourly observations, each column corresponds to a sensor value at a given hour during the week. The input file contains different weeks and the test file contains different weeks. In this input file are gathered building management system values (such as the air handling unit) and several forecasts for the outside temperatures and relative humidity for one week. One input is described as follows.
The output file contains the times series to be predicted hourly from the input. These corresponds to the predictions on the air quality inside the building and on the outside temperatures and relative humidity obtained from the input. The output file is defined as follows. For each Id of the input dataset, the same Id of the output data set contains the following quantities considered as a part of the air quality index (AQI).
For any time step (here ) of any sample in the test set and each output , we ask the model to provide a lower and an upper bound to predict a 95% confidence bound. To do so, we propose to provide the prediction as follows, first provide all the lower bounds (with data in the same order as in the output file) and then all the upper bounds . Therefore the prediction has first lower bounds and then upper bounds.
The performance of the model is assessed by analyzing the predictions based on the input variables of the test file. We use the Predicted Interval Coverage Percentage (PICP) and the Mean Prediction Interval Width [Pearce et al., 2018], see https://arxiv.org/pdf/1802.07167.pdf in equation (15) with and . For any time step (here ) of any sample in the test set and each output , we ask the model to provide a lower and an upper bound and the PICP and the captured MPIW are computed as .
The aim is then to minimize under the constraint . Therefore, following [Pearce et al., 2018], we consider the following penalized loss function for one sample : , where . The total loss is the mean over all samples of this loss.
The benchmark was obtained using a LSTM network with dropout using Torch with loss torch.nn.MSELoss() and Adam optimizer with learning rate lr=1e-1 and
During the test phase, 1000 stochastic runs of the LSTM were used to produce 1000 samples for each data point to be predicted. These samples were then used to build a confidence interval and produce the lower and upper bounds.
Files are accessible when logged in and registered to the challenge