# Challenge Data

### Semantic segmentation of industrial facility point cloud by EDF R&D

Due to the important number of points in the test set, the submission file weighs around 500MB. Please allow several minutes (up to 10) for the upload and scoring of your submissions.

#### Description

##### Dates

Started on Jan. 5, 2022

##### Challenge context

Our Team is part of EDF Research & Development.

Electricité de France (EDF) is a French multinational utility company which activities include electricity generation and distribution, power plant design, maintenance and dismantling, electricity transport and trading. EDF Research & Development aims to:

• Improve EDF Group performance in all of its current ventures and enable customers to benefit from it;
• Prepare the energy scenarios of the future by working on disruptive technologies;
• Carry out research for external commissioning bodies within the framework of partnerships or orders.

In the context of increased maintenance operation and generation renewal work, EDF has developed methods and tools to create and explore an « as-built digital mock-up » of entire industrial buildings using different data such as as-built 3D model, panoramic photographs, point clouds from laser scans. Particularly, our team leverages the use of the as-built digital mock-up by enriching semantically the photographs and 3D point clouds. It also explores automatic point clouds segmentation and CAD reconstruction.

##### Challenge goals

The goal of this challenge is to perform a semantic segmentation of a 3D point cloud.

The point cloud of one EDF industrial facility digital mock-ups is composed of 45 billions of 3D points. The reconstruction work consisting of the fitting of 90 000 geometric primitives on the point cloud. To perform this task, the operators have to manually segment part of the point cloud corresponding to an equipment to then fit the suitable geometric primitive. This manual segmentation is the more tedious of the global production scheme. Therefore, EDF R&D studies solutions to perform it automatically.

Because EDF industrial facilities are sensitive and hardly accessible or available for experiments, our team works with the EDF Lab Saclay boiling room. The digital mock-up of this test environment has been produced with the same methodology than the other industrial facilities.

For the ENS challenge, EDF provides a dataset with a cloud of 2.1 billion points acquired in an industrial environment, the boiling room of EDF Lab Saclay whose design is sufficiently close to an industrial building for this segmentation task. Each point of the cloud has been manually given a ground truth label.

The project purpose is a semantic segmentation task of a 3D point cloud. It consists in training a machine learning model $f$ to automatically segment the point cloud $x=(x_i)_{1 \leq i \leq N}$ in different classes $y=(y_i)_{1 \leq i \leq N}$ where $N$ is the point cloud size. The model infers a label class $y_i = f(x_i)$ for each point $x_i$ .

To assess the results, we compute the weighted F1-score over all $C$ classes (sklearn.metrics.f1_score). It is defined by:

$F_1 := \sum_{i=0}^{C-1} w_i \frac{P_i \times R_i}{P_i + R_i}, $

where $P_i$ , $R_i$ are respectively the point-wise precision and recall of the class $i$ , and $w_i$ is the inverse of the number of true instances for class $i$ .

##### Data description

The boiling room was digitized with LiDAR scanners on tripods and contains 67 scanner positions. Each acquisition at one scanner position (which is called â€œstationâ€) produces one point cloud of about 30 millions of points. The point clouds of the set of stations are registered in a single reference frame. Randomly subsampled point clouds will be provided. The train set contains 50 stations point cloud and the test set contains the remaining 17 stations. The compressed and subsampled dataset weighs 2 Go.

The input variables are point clouds $x = (x_i)_{1 \leq i \leq N}$ where $N$ is the point cloud size and $x_i \in \mathbb{R}^7$ , whose coordinates are :

• triplet of 3D spatial coordinates in a global reference frame,
• a scalar value corresponding to the intensity return of the laser beam,
• RGB values corresponding to the reflected color.

The station-wise point cloud will be provided in compressed PLY format. Here is an example of the header:

format binary_little_endian 1.0
comment Trimble â€“ RealWorks
obj_info Point Cloud Generated by Trimble RealWorks 11.9
element point 38335195
property float x
property float y
property float z
element color 38335195
property uchar red
property uchar green
property uchar blue
element intensity 38335195
property uchar variation


The PLY point cloud can be read with the python Library plyfile.

The output variables are the segmentation classes $y = (y_i)_{1 \leq i \leq N}$ where $y_i \in \{1,â€¦,C\}$ , and $C$ is the number of class. The classes are:

• Background: 0
• Beams: 1
• Cabletrays: 2
• Civils: 3
• Gratings: 4
• Guardrails: 5
• Hvac: 6
• Pipping: 8
• Supports: 9

The ground truth file will be provided in a unique CSV file. The CSV file contains as many lines as points of cloud PLY. For each line, the first element is the point index followed by the class labels.

An additional file map_ind_station.csv provides the mapping between the point index and its station index. Its lines are of the form:

station_id, point_id_low, point_id_high


which means that points corresponding to the station station_idx are assigned the index range from point_id_low to point_id_high, both ends included. These index ranges must be used in the submitted CSV prediction file as well:

ID,class
83645062,0
83645063,0
...
93562962,1
106424117,2
...
`

##### Benchmark description

The benchmark is a constant function which predicts the majority class 3 (Civils) for every point.

#### Files

Files are accessible when logged in and registered to the challenge

#### The challenge provider

EDF R&D Saclay | Acquisition et traitement image/3D