Challenge Data

Semantic segmentation of industrial facility point cloud
by EDF R&D

31/01/22:Β we corrected an inaccuracy in the xtest and xtrain files. Please, check you have the latest versions. Also, due to the important number of points in the test set, the submission file weighs around 500MB. Please allow several minutes (up to 10) for the upload and scoring of your submissions.

Login to your account


Competitive challenge
More than 1GB
Advanced level


Started on Jan. 5, 2022

Challenge context

Our Team is part of EDF Research & Development.

ElectricitΓ© de France (EDF) is a French multinational utility company which activities include electricity generation and distribution, power plant design, maintenance and dismantling, electricity transport and trading. EDF Research & Development aims to:

  • Improve EDF Group performance in all of its current ventures and enable customers to benefit from it;
  • Prepare the energy scenarios of the future by working on disruptive technologies;
  • Carry out research for external commissioning bodies within the framework of partnerships or orders.

In the context of increased maintenance operation and generation renewal work, EDF has developed methods and tools to create and explore an Β« as-built digital mock-up Β» of entire industrial buildings using different data such as as-built 3D model, panoramic photographs, point clouds from laser scans. Particularly, our team leverages the use of the as-built digital mock-up by enriching semantically the photographs and 3D point clouds. It also explores automatic point clouds segmentation and CAD reconstruction.

Challenge goals

The goal of this challenge is to perform a semantic segmentation of a 3D point cloud.

The point cloud of one EDF industrial facility digital mock-ups is composed of 45 billions of 3D points. The reconstruction work consisting of the fitting of 90 000 geometric primitives on the point cloud. To perform this task, the operators have to manually segment part of the point cloud corresponding to an equipment to then fit the suitable geometric primitive. This manual segmentation is the more tedious of the global production scheme. Therefore, EDF R&D studies solutions to perform it automatically.

as-built digital mock-up creation with CAD reconstruction from point cloud

Because EDF industrial facilities are sensitive and hardly accessible or available for experiments, our team works with the EDF Lab Saclay boiling room. The digital mock-up of this test environment has been produced with the same methodology than the other industrial facilities.

For the ENS challenge, EDF provides a dataset with a cloud of 2.1 billion points acquired in an industrial environment, the boiling room of EDF Lab Saclay whose design is sufficiently close to an industrial building for this segmentation task. Each point of the cloud has been manually given a ground truth label.

The project purpose is a semantic segmentation task of a 3D point cloud. It consists in training a machine learning model ff to automatically segment the point cloud x=(xi)1≀i≀Nx=(x_i)_{1 \leq i \leq N} in different classes y=(yi)1≀i≀Ny=(y_i)_{1 \leq i \leq N} where NN is the point cloud size. The model infers a label class yi=f(xi)y_i = f(x_i) for each point xix_i.

To assess the results, we compute the weighted F1-score over all CC classes (sklearn.metrics.f1_score). It is defined by:

F1:=βˆ‘i=0Cβˆ’1wiPiΓ—RiPi+Ri, F_1 := \sum_{i=0}^{C-1} w_i \frac{P_i \times R_i}{P_i + R_i},

where PiP_i , RiR_i are respectively the point-wise precision and recall of the class ii, and wiw_i is the inverse of the number of true instances for class ii.

Data description

The boiling room was digitized with LiDAR scanners on tripods and contains 67 scanner positions. Each acquisition at one scanner position (which is called β€œstation”) produces one point cloud of about 30 millions of points. The point clouds of the set of stations are registered in a single reference frame. Randomly subsampled point clouds will be provided. The train set contains 50 stations point cloud and the test set contains the remaining 18 stations. The compressed and subsampled dataset weighs 2 Go.

EDF point cloud dataset : Point cloud coloured according to the RGB value (left) and station index (right)

The input variables are point clouds x=(xi)1≀i≀Nx = (x_i)_{1 \leq i \leq N} where NN is the point cloud size and xi∈R7x_i \in \mathbb{R}^7, whose coordinates are :

  • triplet of 3D spatial coordinates in a global reference frame,
  • a scalar value corresponding to the intensity return of the laser beam,
  • RGB values corresponding to the reflected color.

The station-wise point cloud will be provided in compressed PLY format. Here is an example of the header:

format binary_little_endian 1.0
comment Trimble – RealWorks
obj_info Point Cloud Generated by Trimble RealWorks 11.9
element point 38335195
property float x
property float y
property float z
element color 38335195
property uchar red
property uchar green
property uchar blue
element intensity 38335195
property uchar variation

The PLY point cloud can be read with the python Library plyfile.

The output variables are the segmentation classes y=(yi)1≀i≀Ny = (y_i)_{1 \leq i \leq N} where yi∈{1,…,C}y_i \in \{1,\dots,C\}, and CC is the number of class. The classes are:

  • Background: 0
  • Beams: 1
  • Cabletrays: 2
  • Civils: 3
  • Gratings: 4
  • Guardrails: 5
  • Hvac: 6
  • Ladders: 7
  • Pipping: 8
  • Supports: 9

The ground truth file will be provided in a unique CSV file. The CSV file contains as many lines as points of cloud PLY. For each line, the first element is the point index followed by the class labels.

An additional file map_ind_station.csv provides the mapping between the point index and its station index. Its lines are of the form:

station_id, point_id_low, point_id_high

which means that points corresponding to the station station_idx are assigned the index range from point_id_low to point_id_high, both ends included. These index ranges must be used in the submitted CSV prediction file as well:


Benchmark description

The benchmark is a constant function which predicts the majority class 3 (Civils) for every point.


Files are accessible when logged in and registered to the challenge

The challenge provider


EDF R&D Saclay | Acquisition et traitement image/3D

Congratulation for the winners of the challenge

1 Loick Chambon
2 ccgg01
3 trageau

You can find the whole list of winners of the season here