Created in 2006, Oze-Energies is an innovative company specialized in Instrumented Energy Optimization of existing commercial buildings. Thousands of communicating sensors, coupled with monitoring and energy optimization softwares, allow to measure and store a huge number of data (temperatures, consumptions, programming, etc...) in real time and continuously. Using data accumulated for a few weeks and its statistical learning algorithms, Oze-Energies models the energy behavior of each building. Oze-Energies experts then identify and evaluate progress actions for equal comfort and without work, acting partly on the settings of climatic equipment (heating, ventilation and air conditioning) and secondly resizing energy contracts. These actions reduce the energy bill of the owners and tenants of about 25% on average per year.
Contacts
Maurice Charbit (mch@oze-energies.com)
Max Cohen (max.cohen@telecom-sudparis.eu )
Sylvain Le Corff (sylvain.lecorff@gmail.com)
Challenge goals
This data challenge aims at introducing a new statistical model to predict and analyze air quality in big buildings using observations stored in the Oze-Energies database. Physics based approaches to build air quality simulation tool in order to simulate complex building behaviors are widespread in the most complex situations. The main drawbacks of such softwares to simulate the behavior of transient systems are:
the significant computational time required to run such models as they integrate many noisy sources and a huge number of parameters and require essentially massive thermodynamics computations;
the fact that they often solely output a single-point estimate at each time, without providing any uncertainty measures to assess their confidence about their predictions.
In order to analyze and predict future air quality to alert and correct building management systems to ensure comfort and satisfactory sanitary conditions, this challenge aims at solving issue ii), i.e. at designing models which takes into account the uncertainty in the exogenous data describing external weather conditions and the occupation of the building. This will allow to provide confidence intervals on the air quality predictions, here on the humidity of the air inside the building.
Data description
The file is decomposed into a training dataset and a test dataset and each dataset contains input and output variables.
Each sample in the training and test sets corresponds to one week of hourly observations, each column corresponds to a sensor value at a given hour during the week. The input file contains 40 different weeks and the test file contains 12 different weeks.
In this input file are gathered building management system values (such as the air handling unit) and several forecasts for the outside temperatures and relative humidity for one week. One input xi​ is described as follows.
Each sample is identified by a unique identification number Id.
Each sample is identified by a unique start time of the time series FirstDayOfWeek.
The output file contains the times series to be predicted hourly from the input. These corresponds to the predictions on the air quality inside the building and on the outside temperatures and relative humidity obtained from the input. The output file is defined as follows. For each Id of the input dataset, the same Id of the output data set contains the following quantities yi​ considered as a part of the air quality index (AQI).
The benchmark was obtained using a LSTM network with dropout using Torch with loss torch.nn.MSELoss() and Adam optimizer with learning rate lr=1e-1 and
epochs = 1000
nb_hidden = 8
nb_layers = 2
dropout_rate = 0.2
During the test phase, 1000 stochastic runs of the LSTM were used to produce 1000 samples for each data point to be predicted.
These samples were then used to build a confidence interval and produce the lower and upper bounds.
Files
Files are accessible when logged in and registered to the challenge