Pierre Manceron - Head of Science at Raidium
The data was updated on February 8, 2024. The number of images in the y_train file has been reduced to 2000 to match the x_train images provided.
Started on Jan. 10, 2024
Here the goal is to segment structures using their shape, but no exhaustive annotations. The training data composed of two types of images:
The test set is made of new images with their corresponding segmented structures, and the metric measures the capacity to correctly segment and separate the different structures on an image;
Note: The segmented structures are not covering the entirety of the image, some pixels being not part of identifiable structures, as we see on the image above. They are thus considered part of the background.
Here the goal is to segment structures using their shape, but no exhaustive annotations. The training data is composed of two types of images:
1) CT images with anatomical and oncologic segmentations masks of individual structures
2) Raw CT images, without any segmented structures
The test set is made of new images with their corresponding segmented structures, and the metric measures the capacity to correctly segment and separate the different structures on an image.
Note: The segmented structures are not covering the entirety of the image, some pixels being not part of identifiable structures, as we see on the image above. They are thus considered part of the background.
The input is a list of 2D grayscale images (i.e. a 3D numpy array), each corresponding to a slice of a CT-scan (in the transverse plane) of size 512x512 pixels; The slices are randomized and thus there is no 3D information;
The label/output is a list of 2D matrices (i.e a 3D numpy array) of size 512x512 pixels, with integer (uint8) values. Each position (w,h) of each matrix $Y_{i,w,h}$ identifies a structure.
For example on the Figure 1 above, the 23 colors correspond to 23 different segmented structures, thus each pixel label $Y_{i,w,h}$ at position (w,h) has values in the integer range [[0; 23]]
. 0 is a special value meaning this pixel is not part of any structure and then part of the background.
In practice, the output is encoded as a CSV file that encodes the transpose of the flattened label matrix. Note: The transpose is used here for performance reasons: Pandas is very slow to load CSV files with many columns, but is very fast to load CSV files with many rows. Thus, the CSV is composed of 262144 columns, each corresponding to a pixel of the image, and 500 rows, each corresponding to an image.
To get the list of 2D predictions, you must therefore transpose the received CSV, and reshape it:
import pandas as pd
predictions = pd.read_csv(label_csv_path, index_col=0, header=0).T.values.reshape((-1, 512, 512))
# In the end, we get a list of 2D predictions, i.e. 3D numpy array of shape (500, 512, 512)
In order to get the output CSV from a list of 2D predictions, it is necessary to flatten each of the predictions, and concatenate them into a single matrix, then transpose the matrix and finally save it in a CSV file:
import pandas as pd
# predictions is a list of 2D predictions, i.e. a 3D numpy array of shape (500, 512, 512)
predictions = np.array of [prediction_1, prediction_2, prediction_3, ...]
pd.DataFrame(predictions.reshape((predictions.shape[0], -1))).T.to_csv(output_csv_path)
This problem can be seen as an image-wise pixel clustering problem, where each structure is a cluster in the image: The pixel labels values are structure identifiers on a specific image and are not necessarily coherent between images. For example the structure associated with the liver can be mapped to the label 4 on one image and to 1 on another image.
The train set is composed 2000 images, split into two groups:
Note: Segmentation maps of structures of the train set (in addition to the CSV) are given in the supplementary materials.
The test set is composed of 500 images with organ and tumor segmentation. For these images, the corresponding label is a 2D matrix with the segmented structures, and the other pixels set to 0. Considering the fact that an individual image with its label matrix is around 400KB, and we have 1500 images, we then have a dataset of about 600MB in total.
Note: The segmentation map is not dense, meaning that some pixels between structures are not segmented, as we can see on the image above. These pixels are considered to be part of the background.
Note: It is not authorized to use additional radiological training data, or radiological pre-trained models, or any other external radiological data source. You are however allowed to use pre-trained models and data that are not radiological ( (DINO v2, SAM ...).
The global metric is computed by averaging the Rand Index between each label and its associated prediction, while also excluding background pixels (i.e. 0 in the label). This clustering metric is invariant to inter-image permutation of the structure numbering and is implemented in sklearn here.
A getting started notebook can be found at the following address:
https://colab.research.google.com/drive/1OOzMtT62OFl_tURo4TWjFKJf6jRMc5_O?usp=share_link
It includes examples of data loading, visualization, metric computation and a baseline.
The benchmark is based on classical vision algorithms, such as watershed, sobel filters etc.
Files are accessible when logged in and registered to the challenge