Suez

by Suez

From Jan. 6, 2020 to Dec. 18, 2020

As a provider of drinking water to millions of consumers,
**SUEZ** needs to know the exact volume consumed by each client (ie the index of his meter).
A modern solution to this problem is telemetering whereby the meter transmits automatically the daily index to our servers.
This is already deployed to millions of meters, but there are still some contracts where our operators have to visit the meters
once a year, sometimes more.
This often involves arranging a meeting with the client when the meters are on private property, which can prove difficult (think about secondary housing)
**The goal of this challenge is to simplify the process by allowing the client to do the reading himself if it is more convenient**:
She could just take a picture of the meter, upload it to our servers whereupon a Machine Learning algorithm would validate it and read the digits to get the index.

Prototypes of this projects already exist but require the client to send the picture through email to our service center, which will analyze it and reply several hours later, making any feedback on the picture quality very difficult.

For this challenge, we'll assume that every image represents a meter with an index that can be read by a human.

The goal of this challenge is to design an algorithm reading the consumption index from a valid picture of a meter.

The data consists of 1000 annotated RGB pictures of meters. The quality is quite heterogeneous as can be expected given that meters are often located underground. The meters share a common shape but are not all identical, and some can be rotated. The index part consists of (up to) 8 rotating wheels to display the digits of the index: 5 white on black digits for the cubic meters, followed by 3 white on red (or red on white) digits for the liters. By construction, it can happen that the wheel are rotating precisely at the moment the picture is taken, but we'll make sure for this challenge that they are unambiguous.

For each of the pictures, a human annotated the index in cubic meters (truncated).

The `index.csv`

file is a CSV file with one line and two columns per meter: ID and index (truncated cubic meter).
The picture corresponding to a given meter can be found using its ID: `<meter ID>.jpg`

The first line of this file is the header (`ID,index`

).

Given that domestic clients have a water consumption below 1 cubic meter / day (a bath is be around 300 L)
and that we do not take liters into account when billing the client, we only care about the last three cubic meter digits.
In the following, $y_i$
and $\hat{y}_i$
refer to water volume and estimated water volume (*i.e.* reading) expressed in cubic meter.

However, we need the readings to be exact and thus opt for a modified zero-one loss:

$\frac{1}{N}\sum_{0\leq i < N} \left\lbrace \begin{array}{l} 0 \textrm{ if } \lfloor{\hat{y_i}}\rfloor \equiv \lfloor{y_i}\rfloor \textrm{ mod } 1000\\ 1 \textrm{ if } \lfloor{\hat{y_i}}\rfloor \not\equiv \lfloor{y_i}\rfloor \textrm{ mod } 1000 \end{array} \right. ```$

where $\lfloor{\cdot}\rfloor$ is the integer part of a floating point number.It is worth noting, however, that the annotations include all the visible digits of the volume in cubic meters (thousands of cubic meters etc.). Some competitors may find this additional training set useful and only truncate for the final submission.

Files are accessible when logged in and registered to the challenge