Challenge Data

Optimizing well-being at work
by Oze Energies

Login to your account to try this challenge!



Started on Jan. 1, 2019

Challenge context

Well-being at work is characterized by the way employees express their satisfaction with respect to their thermal environment, air quality of their workplace or environmental noise during work days. This subjective perception of the environmental conditions, such as feeling too warm or too cold for instance, has a tremendous impact on the health, the productivity and the well-being at work of each individual. Designing a data driven algorithm which manages to predict individuals' comfort with respect to their workplace environment is pivotal to providing adapted tunings of building management systems (such as the set points of heating, ventilating, and air conditioning systems) so as to ensure and monitor the best comfort in the building.

Challenge goals

This challenge proposes to develop machine learning based approaches so as to predict individuals' comfort model using several time series of environmental data obtained from sensors in a large building. The objective is to learn a classifier that uses these time series as inputs to predict the associated comfort class computed as an average of the comfort classes of all individuals in the building, assumed to experience the same environmental conditions.

Data description

3 datasets are provided as csv files, split between training inputs and outputs, and test inputs. Both input datasets contain the environmental features that may be used to predict individuals’ comfort as well as a timestamp and a unique identifier. In more detail, both input files consist of 7 columns separated by commas representing the different variables, denominated as follows in the first line (header): ID: integer, uniquely identifies each observation. Date: string, defines date under format yyyy-­mm-­dd hh:mm:ss. Temperature: real number, temperature inside the room. Humidity: real number, humidity of ambient air in the room. Humex: real number, indicator of air quality in the room. CO2: integer, CO2 level in the room, in ppm (parts per million), indicator of air quality in the room. Bright : integer, characterizes the brightness of the room. Each line of the training output file contains the aggregated comfort class associated with the features in the corresponding line sharing the same ID of the training input file. The classes are {1,2,3,4,5}, 5 being the optimal comfort and 1 the worst. The comfort class is computed as the mean of the comfort responses (closest integer) given by all the individuals between two consecutive time steps. In the case where this mean is the midpoint between two integers, i.e 2.5, the smallest closest integer is assigned as the comfort class, i.e 2 in this case. When no comfort response are recorded by individuals between two time steps, the building manager assigns a comfort class. The input and output training files are used to learn a classifier while the test input file contains additional input data on which participants must make comfort class predictions using their learned classifier. The solution file submitted by participants, which shall follow the same format as the training output file (i.e contain only two columns, ID and comfort class, where the ID values correspond to the input test data) is then compared to the true outputs (contained in a test output file unknown to participants) through a score computed using the challenge’s metric. 8000 observations (i.e. lines) are available for the training datasets while 2000 observations are used to test the classifier and compute its score. The metric used to rank classifiers is the average accuracy score implemented in sklearn, sklearn.metrics.accuracy_score.

Benchmark description

The benchmark algorithm is a K­-neighbors classifier, where K=5, implemented using sklearn.neighbors.KNeighborsClassifier() classifier with default parameters.


Files are accessible when logged in and registered to the challenge

The challenge provider