Market surveillance
Started on Jan. 4, 2021
Financial markets are made up of various types of market players, each with specific interests and hence heterogeneous behaviours. For regulators, it is important to know the different types of market players in order to better understand how their behaviour has an impact on the market. Since the emergence of High-Frequency Trading (HFT) [1] more than a decade ago, financial sector authorities as well as academics have widely studied the impact and influence on markets of these market players, who invest in powerful low-latency infrastructure to transact a large number of orders in fractions of a second [2].
HFTs as well as other market players submit orders to an electronic trading mechanism called a Limit Order Book (LOB): orders are requests to buy or sell a given quantity of an asset at a specified price, thus allowing buyers to be matched with sellers at a mutually agreed price. Market players can modify the price or the quantity of their orders as long as they remain in the LOB ; they also can cancel them (modification and cancellation are tagged as « events »).
For this challenge, we focus on equity markets, and since a given stock can be traded on multiple trading venues (market fragmentation), participants can choose to which venue they send an order (each trading venue has its own distinct LOB). Thanks to their advantage in terms of speed, HFTs, who seek direct market gain based on small price variations, often apply arbitrage strategies [3] between several trading venues. Besides, HFTs generally send more events into the LOB than other participants, or at least are able to send two events with « the shortest » delta time possible.
[1] High Frequency trading definition
[2] AMF paper on HFTs behavior on Euronext Paris
[3] Arbitrage definition
The goal of the challenge is to classify traders within three categories, HFT, non HFT and MIX.
According to the AMF in-house expert-based classification, based on the knowledge that AMF has on the market players, market players are divided into three categories, HFT, MIX and non-HFT.
From a set of behavioural variables based on order and transaction data, the challenger is invited to predict the category to which a given participant belongs.
The proposed classification algorithm will then be applied to other data sources for which market players are currently not well known by the AMF.
Each market player (i.e. participant) is represented by a matrix , whose row provides a given market player’s behaviour variables calculated for a given stock and a certain trading date . Since all market players are not active every day nor on the whole scope of assets, the length of the matrices may vary from participant to participant. The columns of contain the features (detailed below).
The objective is to find the function such as , where , in other words y refers to the market player’s category.
The y_train file as the y_test file contain the market players’ identification code ( etc.) and the category they belong to (). Participants falling under the category are those who can sometimes use HFT algorithms but not systematically.
The x_train file and the x_test file data contain both the equivalent of 1 month of data (x_train data are prior to x_test data). The scope of market players is roughly the same in the y_train and y_test data (but the market players’ identification code have been changed, so that it is not possible to find who is, for example, of the train (x and y) files, in the test (x and y) files.
x_Train and x_test data exhibit in their rows the same 35 features calculated for a given market player i on a certain stock (whose identification code is an Isin) and a specific trading date :
The features above are not detailed in the same order as in the challenge files.
For example (based on the columns' order in the train files : « Isin_1, Date_1, Trader_1, 5, 2.3, 10, … » means that for on , we have observed that has an OTR equals to 5, an OCR equals to 2.3 and an OMR equals to 10.
Finally, the link between test and train data can be established thanks to the market player’s identification code.
[1] TV_1 is the trading venue with the highest volume traded
[2] Events include both the transactions and the messages that market players can send to the LOB: new order, order modification or order cancellation.
A basic random forest with additional rules based on threshold to determine what percentage of rows by maket players makes them fall into one of each category gave us a micro-averaged F1- score of ~90%. For this « naive » model we have considered that a market player whose :
Otherwise the model considers that the market player is a NON HFT.
Files are accessible when logged in and registered to the challenge