Challenge Data

Rakuten Multi-modal Colour Extraction
by Rakuten Institute of Technology, Paris

Login to your account


Competitive challenge
Economic sciences
Multimodal classification
More than 1GB
Advanced level


Started on Jan. 4, 2021

Challenge context


Rakuten, created in 1997 in Japan and at the origin of the marketplace concept, became one of the largest e-commerce platforms worldwide. Along with the global marketplaces, Rakuten supports an ever-expanding list of acquisitions and strategic investments in disruptive industries and growing markets, such as communications, financial services, digital contents, and gathers more than one billion users in an international ecosystem.

Rakuten Institute of Technology (RIT) is the research and innovation department of Rakuten, with teams in Tokyo, Paris, Boston, San Mateo, Singapore, Bengaluru. RIT does applied research in the domains of computer vision, natural language processing, machine / deep Learning and customer behaviour analysis.


This challenge focusses on prediction of colour attribute of products from a large-scale multimodal (text and image) e-commerce product catalog data of Rakuten Ichiba marketplace.

The catalog of product listings for any e-commerce marketplace consists of product information that is provided by the merchants. Typically a merchant provides the title, description, and image(s) of the product. Extracting various attributes are useful in several contexts, such as recommendations, search, product discovery, etc. Manual and rule-based approaches to attribute extraction are not scalable due to the sheer size of the product catalog. Deploying multimodal approaches would be a useful technique as the colour information can be predicted either through the image or though the text that the merchant have uploaded. Advances in this area of research have been limited due to the lack of real data from actual commercial catalogs. The challenge presents several interesting research aspects due to the intrinsic noisy nature of the product labels and images, the size of modern e-commerce catalogs, and the typical unbalanced data distribution.

Legal Notice

By express derogation from any preexisting or future contractual documents and/or terms and conditions pertaining to the Rakuten Data Challenge occurring on the occasion of the Challenge Data of ENS and Collège de France (“Rakuten Data Challenge”), the participant (“Participant”) agrees to the following conditions in connection with the study data (“Study Data”) uploaded by Rakuten, Inc., 1-14-1 Tamagawa, Setagaya-ku, Tokyo, Japan, (the “Provider”) on the occasion of the Rakuten Data Challenge.

The Participant shall

(i) use the Study Data for the sole purpose of the good performance of the Rakuten Data Challenge (the “Purpose”),

(ii) notwithstanding the above, not show or disclose the Study Data in the result presentations of the Rakuten Data Challenge,

(iii) not use, apply, reveal, report, publish, extract or otherwise disclose to any third party all or part of the Study Data in any circumstances for a purpose other than the Purpose.

As of the termination of the Rakuten Data Challenge, the Participant shall immediately cease any use of the Study Data unless otherwise agreed by the Provider. The present specific terms shall remain in full force and effect until the termination of the Purpose and for a period of two (2) years following the termination date of the Purpose.


For any questions about this challenge please contact to the following address:

Challenge goals

The goal of this data challenge is to predict the "colour" of a product, given its image, title, and description. A product can be of multiple colours, making it a multi-label classification problem.

For example, in Rakuten Ichiba catalog, a product with a Japanese title タイトリスト プレーヤーズ ローラートラベルカバー (Titleist Players Roller Travel Cover) associated with an image and sometimes with an additional description. The colour of this product is annotated as Red and Black. There are other products with different titles, images, with possible descriptions, and associated colour attribute tags. Given these information on the products, like the example above, this challenge proposes to model a multi-label classifier to classify the products into its corresponding colour attributes.


The metric used in this challenge to rank the participants is the weighted-F1 score.

Scikit-Learn package has an F1 score implementation (link) and can be used for this challenge with its average parameter set to "weighted".

Data description

For this challenge, Rakuten is releasing approx. 250K item listings in CSV format, including the train (212,659) and test set (37,528). The dataset consists of product titles, product descriptions, product images and their corresponding colour attribute tags. There are 19 unique colour tags in the dataset.

The data are divided under two criteria, forming four distinct sets: training or test, input or output.

  1. X_train.csv: training input file
  2. Y_train.csv: training output file
  3. X_test.csv: test input file

Additionally file is supplied containing all the images. Uncompressing this file will provide a folder named images with all the item images.

The first line of all the files contains the header, and the columns are separated by comma (',').

The columns of the input files (X_train.csv and X_test.csv) are:

  1. image_file_name - The name of the image file in images folder corresponding to the item.
  2. item_name - The item title, a short text summarizing the item.
  3. item_caption - A more detailed text describing the item. Not all the merchants use this field, so to retain originality of the data, the description field can contain NaN value for many products.

Here is an example of an input file:

296409_10002365_1.jpg,【 UES(ウエス) 】 67LW オリジナル吊り編み鹿の子ポロシャツ [ Tシャツ ][ アメカジ ] [ メンズ ] [ 送料・代引き手数料無料 ],【 商品について 】 ※以下の点を予めご了承下さい。 ■実寸サイズは当店在庫の1点を採寸しております。個体差により、若干のサイズ誤差が生じる場合がございます。 ■商品画像はお使いのパソコン・携帯の種類や環境により、色・質感等に若干の誤差が生じる場合がございます。 ■この商品は当店実店舗・他サイトでも販売しております。在庫数の更新は随時行っておりますが、時間差により、品切れになってしまうこともございます。
214151_10015053_1.jpg,SALE ゴルフウェア メンズ トップス ベスト 春 おしゃれ 大きいサイズ ゴルフ ウェア アーガイル チェック メンズ ニット Vネック スポーツウェア 大人 サックス ピンク M〜XXLアーガイル柄Vネックスプリングゴルフベスト(CG-BS913),---- ギフト用の包袋をご一緒にいかがでしょうか ----▼関連キーワードゴルフウェア メンズ ゴルフ ウェアメンズ ストレッチ おしゃれ 大きいサイズ ゴルフ 飛距離upゴルフウエア ストレッチ 飛距離アップ ゴルフパンツ ストレッチ春 夏 秋 冬 メンズ

The training output file (y_train.csv) contains the color_tags, the category for the classification task, for each product in the training input file (X_train.csv). Here also the first line of the file is the header. There is a one-to-one mapping between the lines of training input and training output files.

Here is an example of the output file corresponding to the above example of the input file:

['White', 'Green', 'Black']
['Pink', 'Blue']

Submission File Format

For the test input file X_test.csv, participants need to provide a test output file in the same format as the training output file. The first line of this test output file should contain the header color_tags, and then a list of predicted colour tags per line. One can recall that each item may contain multiple colour tags. There should be a one-to-one correspondence between the lines of the predicted test output file and the lines describing the items in the test input file (X_test.csv). A sample prediction file is also provided to show the format expected.

Here is an example of an expected prediction file:

"['Brown', 'Navy', 'Transparent', 'White']"
"['Burgundy', 'Red']"

Benchmark description

The benchmark model only uses the images. However, the participants are encouraged to use both images and texts while designing a classifier, since they contain complementary information.

For the image based classifier, a version of Densely Connected Networks (DenseNet) model (reference) is used. DenseNet121, pre-trained on ImageNet, from PyTorch model hub is used as the image feature extractor. For each image, the output of the global average pooling layer is used as input features to a fully-connected layer with 19 outputs (corresponding to the colour categories). The loss function is per-label binary cross entropy loss.

During inference, the model compute a score between 0 and 1 for each of the categories. We consider that the model makes a prediction if the output of the sigmoid is greater than 0.5.

Benchmark Performance

Following is the weighted-F1 score obtained using the benchmark models described above on the images:



Files are accessible when logged in and registered to the challenge

The challenge provider


Research wing of Rakuten, one of the largest e-commerce companies in the world