Challenge Data

PhotoRoom Object Segmentation from Synthetic Images
by Artizans

Login to your account to try this challenge!



Started on Jan. 6, 2020

Challenge context

We are the editor of PhotoRoom, an iPhone image editor allowing users to work with physical objects instead of pixels. Our segmentation models automatically remove the background from images and suggest edits to users.

In our efforts to experiment with synthetic data, we generated a synthetic dataset made of 20,000 (+ 1,200 for the test set) images. The images are generated:

  • • From very high-quality 3D models
  • • Using high-quality textures
  • • Under various lighting conditions
  • • Using a physical renderer

We challenge students to tackle the task of segmenting objects from images.

Challenge goals


Provided with an image of an object, the goal is to segment the salient (main) object in the scene. For each pixel of the input image, the model must categorize it as foreground or background.


Since the masks are binary masks (each pixel is either 0 (background) or 1(foreground)), we will score submissions using the Dice coefficient. It can be used to compare the pixel-wise agreement between a predicted segmentation and its corresponding ground truth. The formula is given by:

2XYX+Y \frac{2 *|X \cap Y|}{|X|+|Y|}


  • • X is the predicted set of pixels
  • • Y is the ground truth

The leaderboard score is the mean of the Dice coefficients for each image in the test set.

Data description

The train set and images from the test set are available here

About the test set

The test set is made of 1,200 images. Some images represent objects used in the train set while others represent objects unavailable in the test set. The same goes for textures used for the "floor" of images.

The goal is to create a model that can generalize to new shapes and textures.

Input specification

RGB image of size 1280 x 720 pixels in JPEG format


Binary mask 1280 x 720 pixels where each pixel has a value of 1 for foreground and 0 for background. PNG format.

Submission format

To reduce the size of the submission, the output will be encoded using run-length encoding (RLE).

The expected output is a csv with a header and two named columns:

  • img: the id of the image. Example 1103_1103
  • rle_mask: the RLE encoding of the mask

A sample submission is provided as well as a repository containing:

  • • helper functions to convert to RLE (Run-length encoding) and back
  • • a scoring function for Dice Score
  • • a starter project using Keras to generate predictions


Files are accessible when logged in and registered to the challenge

The challenge provider