Challenge Data

Where will the next trade take place?
by CFM


Login to your account


Description


NO LOGO FOR THIS CHALLENGE
Competitive challenge
Economic sciences
Finance
Classification
Tabular
10MB to 1GB
Basic level

Dates

Started on Jan. 6, 2020


Challenge context

Most financial markets use an electronic trading mechanism called a limit order book to facilitate the trading of assets (stocks, futures, options, etc.). Participants submit (or cancel) orders to this electronic order book. These orders are requests to buy or sell a given quantity of an asset at a specified price, thus allowing buyers to be matched with sellers at a mutually agreed price [1][2]. Since an asset can be traded on multiple trading venues, participants can choose to which venue they send an order. For instance, a US stock can be traded on various exchanges, such as NYSE, NASDAQ, Direct Edge or BATS. When sending an order, participants generally select the best available trading venue at that time [3]. Their decisions may include a statistical analysis of past venue activity.

[1] Trades, Quotes and Prices - Financial Markets Under the Microscope – Jean-Philippe Bouchaud, Julius Bonart, Jonathan Donier, Martin Gould – 2018
[2] https://en.wikipedia.org/wiki/Order_book_(trading)
[3] https://en.wikipedia.org/wiki/Smart_order_routing


Challenge goals

Given recent trades and order books from a set of trading venues, predict on which trading venue the next trade will be executed.

Community forum for sharing ideas and making faster progress:

http://datachallenge.cfm.fr/

Additional information can also be found on this forum and after registering on the Challenge Data website.


Data description

How to load the dataset with pandas

The dataset is stored in an HDF5 file and can be loaded with pandas by running the following command:

>>> train_data = pandas.read_hdf("train_data.h5", 'data')

Labels are stored in a CSV file and can be loaded as follows:

>>> labels = pandas.read_csv("train_labels.csv")

Description of rows in the dataset

For each row, we want predict on which venue the next trade will be executed. The stock is represented by a randomized stock_id and the day by a randomized day_id.

Each row provides a description of six order books, from six trading venues, and a history of trades for the corresponding asset.

Order books

An order book lists the quantities of an asset that are currently on offer by sellers (who ask for higher prices) and the quantities that buyers wish to acquire (who bid at lower prices). The six order books (one for each trading venue) are described in the dataset through the best two bids and best two asks (which makes them respectively the two highest bid prices of the buyers and the two lowest ask prices of the sellers).

We call aggregate volume the sum of all quantities on both the bid and ask sides (only considering the best two bids and best two asks) in the six given books.

We also define the aggregate mid-price as the average of the best bid among the six books (i.e. the maximum of the best six bids) and the best ask among the six books (i.e. the minimum of the best six asks).

Each of the six books is described as follows:

  • The 'bid' column (respectively 'ask') represents the difference between the best bid (respectively best ask) and the aggregate mid-price, expressed in some fixed currency unit.

  • The 'bid1' column (respectively 'ask1') represents the difference between the second best bid (respectively second best ask) and the aggregate mid-price, expressed in some fixed currency unit.

  • The 'bid_size' column (respectively 'ask_size') represents the total number of stocks available at the best bid (respectively best ask) divided by the aggregate volume.

  • The 'bid_size1' column (respectively 'ask_size1') represents the total number of stocks available at the second best bid (respectively at the second best ask) divided by the aggregate volume.

  • The 'ts_last_update' column corresponds to the timestamp, given as a number of microseconds since midnight (local time), of the last update of the book.

A NAN value indicates an empty book or a partially empty book.

Note that all prices are expressed in the same fixed currency unit.

Besides, since all the trading venues belong to the same time zone, all the timestamps have the same time reference.

Trades

Each row also comprises a description of the ten last trades (ordered from the most recent one to the oldest one) for the corresponding asset. A trade represents a transaction of a certain quantity of an asset at a given price between a buyer and a seller. Most trade result from the matching of an order in an order book with an incoming order but they can also result from a different trading mechanism such as auctions. Trades often involve fees, paid by the buyer and the seller, which vary from one trading venue to another. The ten trades from the history of trades are given are described as follows:

  • Its quantity ('qty'): the number of stocks traded, divided by the aggregate volume (defined above in the section "Order book").

  • Its timestamp ('tod'): when the trade was executed, given as a number of microseconds since midnight (local time).

  • Its price ('price'), representing the difference between the trade price with the aggregate mid-price (defined in the section "Order book"), expressed in some fixed currency unit.

  • Its source ('source_id') representing the trading venue on which this particular trade was executed.

Description of labels

The labels correspond to the trading venues to predict. Each of the six trading venues is represented by a number between 0 and 5.


Benchmark description

The baseline simply consists in taking the trading venue of the most recent trade in the history


Files


Files are accessible when logged in and registered to the challenge


The challenge provider


PROVIDER LOGO

Founded in 1991, Capital Fund Management (CFM) is a successful alternative asset manager and a pioneer in the field of quantitative trading applied to capital markets across the globe. Our methodology relies on statistically robust analysis of terabytes of data to inform and enable our asset allocation, trading decisions and automated order execution. Our people’s diversity and dedication contribute to CFM’s unique culture of research, innovation and achievement. We are a Great Place to Work company and we offer a collaborative and informal work environment, attractive offices and facilities.


Congratulation for the winners of the challenge


1 Quentin Jacob
2 Romain Poncet
3 -

You can find the whole list of winners of the season here