Challenge Data

Learning factors for stock market returns prediction
by QRT


Login to your account to try this challenge!


Description


NO LOGO FOR THIS CHALLENGE
Dates

Started on Jan. 5, 2022



Challenge context

A classic prediction problem from finance is to predict the next returns (i.e. relative price variations) from a stock market. That is, given a stock market of NN stocks having returns RtRNR_t\in\mathbb R^N at time t,t, the goal is to design at each time tt a vector St+1RNS_{t+1}\in\mathbb R^N from the information available up to time tt such that the prediction overlap St+1,Rt+1\langle S_{t+1},R_{t+1}\rangle is quite often positive. To be fair, this is not an easy task. In this challenge, we attack this problem armed with a linear factor model where one learns the factors over an exotic non-linear parameter space.

NB: There is a dedicated forum for this challenge.

More precisely, the simplest estimators being the linear ones, a typical move is to consider a parametric model of the form

St+1:==1FβFt, S_{t+1}:=\sum_{\ell=1}^F \beta_\ell \, F_{t,\ell}

where the vectors Ft,RNF_{t,\ell}\in\mathbb R^N are explicative factors (a.k.a. features), usually designed from financial expertise, and β1,,βFR\beta_1,\ldots,\beta_F\in\mathbb R are model parameters that can be fitted on a training data set.

But how to design the factors Ft,F_{t,\ell} ?

Factors that are “well known” in the trading world include the 55 -day (normalized) mean returns Rt(5)R_t^{(5)} or the Momentum Mt:=Rt20(230)M_t:= R_{t-20}^{(230)} , where Rt(m):=1mk=1mRt+1k.R_t^{(m)}:=\frac{1}{\sqrt{m}}\sum_{k=1}^{m} R_{t+1-k}. But if you know no finance and have developed enough taste for mathematical elegance, you may aim at learning the factors themselves within the simplest class of factors, namely linear functions of the past returns:

Ft,:=k=1DAkRt+1k F_{t,\ell}:=\sum_{k=1}^{D} A_{k\ell} \, R_{t+1-k}

for some vectors A:=(Ak)RDA_\ell:=(A_{k\ell})\in\mathbb R^D and a fixed time depth parameter D.D.
Well, we need to add a condition to create enough independence between the factors, since otherwise they may be redundant. One way to do this is to assume the vectors AA_\ell 's are orthonormal, Ak,A=δkl\langle A_k,A_\ell\rangle = \delta_{kl} for all k,k,\ell , which adds a non-linear constraint to the parameter space of our predictive model.

All in all, we thus have at hand a predictive parametric model with parameters:

  • a D×FD\times F matrix A:=[A1,,AF]A:=[A_1,\ldots,A_F] with orthonormal columns,
  • a vector β:=(β1,,βF)RF.\beta:=(\beta_1,\ldots,\beta_F)\in\R^F.

Note that it contains the two-factor model using Rt(5)R_t^{(5)} and MtM_t defined above, or the autoregressive model AR from time series analysis, as submodels.


Challenge goals

The goal of this challenge is to design/learn factors for stock return prediction using the exotic parameter space introduced in the context section.

Participants will be able to use three-year data history of 5050 stock from the same stock market (training data set) to provide the model parameters (A,β)(A,\beta) as outputs. Then the predictive model associated with these parameters will be tested to predict the returns of 5050 other stocks over the same three-year time period (testing data set).

We allow D=250D=250 days for the time depth and F=10F=10 for the number of factors.

Metric. More precisely, we assess the quality of the predictive model with parameters (A,β)(A,\beta) as follows. Let R~tR50\tilde R_t\in\R^{50} be the returns of the 5050 stocks of the testing data set over the three-year period (t=0753t=0\ldots753 ) and let S~t=S~t(A,β)\tilde S_{t} = \tilde S_{t}(A,\beta) be the participants' predictor for R~t\tilde R_{t} . The metric to maximize is defined by

Metric(A,β):=1504t=250753S~t,R~tS~tR~t \mathrm{Metric}(A,\beta):= \frac 1{504}\sum_{t=250}^{753} \frac{\langle \tilde S_{t}, \tilde R_{t}\rangle}{\|\tilde S_{t}\|\|\tilde R_{t}\|}

if Ai,Ajδij106|\langle A_i,A_j\rangle-\delta_{ij}|\leq 10^{-6} for all i,ji,j and Metric(A,β):=1\mathrm{Metric}(A,\beta):=-1 otherwise.

By construction the metric takes its values in [1,1][-1,1] and equals to 1-1 as soon as there exists a couple (i,j)(i,j) breaking too much the orthonormality condition.

Output structure. The output expected from the participants is a vector where the model parameters A=[A1,,A10]R250×10A=[A_1,\ldots,A_{10}]\in\mathbb R^{250\times 10} and βR10\beta\in\R^{10} are stacked as follows

Output=[A1A10β]R2510 \text{Output} = \left[\begin{matrix} A_1 \\ \vdots \\ A_{10} \\ \beta \end{matrix}\right]\in\mathbb R^{2510}


Data description

The training input given to the participants XtrainX_{train} is a dataframe containing the (cleaned) daily returns of 5050 stocks over a time period of 754754 days (three years). Each row represents a stock and each column refers to a day. XtrainX_{train} should be used to find the predictive model parameters A,β.A,\beta.

The returns to be predicted in the training data set are provided in YtrainY_{train} for convenience, but they are also contained in XtrainX_{train} .


Benchmark description

A possible "brute force" procedure to tackle this problem is to generate orthonormal vectors A1,,A10R250A_1,\ldots,A_{10}\in\mathbb R^{250} at random and then to fit β\beta on the training data set by using linear regression, to repeat this operation many times, and finally to select the best result from these attempts.

More precisely, the QRT benchmark strategy to beat is (see the notebook in the supplementary material):

Repeat Niter=1000N_{iter}=1000 times the following.

  1. Sample a 250×10250\times 10 matrix MM with iid Gaussian N(0,1)N(0,1) entries.

  2. Apply the Gram-Schmidt algorithm to the columns of MM to obtain a matrix A=[A1,,A10]A=[A_1,\ldots,A_{10}] with orthonormal columns (see the randomA function).

  3. Use the columns of AA to build the factors and then take β\beta with minimal mean square error on the training data set (with fitBeta).

  4. Compute the metric on the training data (metricTrain).

Return the model parameters (A,β)(A,\beta) that maximize this metric.

Remark: The orthonormality condition for the vectors A1,,AFA_1,\ldots,A_F reads ATA=IFA^T A=I_F for the matrix A:=[A1,,AF].A:=[A_1,\ldots,A_F]. The space of matrices satisfying this condition is known as the Stiefel manifold, a generalization of the orthogonal group, and one can show that the previous procedure generates a sample from the uniform distribution on this (compact symmetric) space.


Files


Files are accessible when logged in and registered to the challenge


The challenge provider


PROVIDER LOGO

Qube Research & Technologies Group is a quantitative and systematic investment manager employing around 300 people with offices in Hong Kong, London, Mumbai, Paris and Singapore. We are a technology driven firm implementing a scientific approach to financial investment. QRT’s market presence is global and expands across the largest liquid electronic venues. The combination of data, research, technology and trading expertise has shaped our DNA and is at the heart of our innovation and development dynamic. The firm acts as an investment manager managing open-ended Funds used for management of third party capital.