Napoleon Crypto/NaPoleonX is a company specialised in designing quantitative systematic investment solutions, i.e. investment solutions based on algorithms. Napoleon Asset Management ('NAM') is a subsidiary of Napoleon Crypto and a regulated asset management company specialised in designing quantitative investment solutions with the particularity to focus on cryptocurrencies as investment universe. In its research to explore new investment solutions, NAM is always considering using new disruptive machine and deep learning algorithms.
Challenge goals
The goal of the challenge is to predict the returns vs. bitcoin of clusters of cryptoassets.
At Napoleon, we are interested in detecting which assets are likely to move together in the same direction, i.e. assets whose returns (absolute price changes) are statistically positively correlated. Such assets are regrouped into "clusters", which can be seen as the crypto equivalent of equity sectors or industries. The knowledge of such clusters can then be used to optimize portfolios, build advanced trading strategies (long/short absolute, market neutral), evaluate the systematic risk, etc. In order to build new trading strategies, it can be helpful to know whether a given sector/cluster will outperform the market represented by the bitcoin. For this reason, given a cluster C={A1β,...,Anβ} composed of n assets Aiβ, this challenge aims at predicting the return relatively to bitcoin of an equally weighted portfolio composed of {A1β,...,Anβ} in the next hour, given series of returns for the last 23 hours for assets in the cluster, as well as some metadata.
Data description
It is well known that if we want clusters to remain meaningful, they have to be updated on a regular basis as they evolve over time and change with market types (bull/bear, range, ...). As a result, the clustering is not time consistent, i.e. the relation between asset i and asset j is likely to change over time. Clusters are typically updated every week, however each clustering is considered valid for three weeks in this challenge. As a consequence, we build from each cluster 21 samples corresponding to 21 days.
Each sample is a thus a pair {cluster,day} where each cluster C is composed of n assets A1β, ..., Anβ. It is worth noting that the number of assets n in each cluster is not a shared hyperparameter, and changes from cluster to cluster. For each asset Aiβ and each sample day, the sequence of the hourly returns of the first 23 hours of the day is provided. In addition, two other quantities (mc and bc) are given but their nature will be kept undisclosed.
From these input data, the goal is to predict for each sample {cluster,day} the mean return of assets in the cluster relatively to the bitcoin during the last hour of the day. The hourly returns provided are relatively to the bitcoin performance since assets prices are assumed to be in bitcoin: consequently, a positive return means that an asset has outperformed the bitcoin, a negative that it has underperformed.
Dates corresponding to the construction of a cluster or sample are not provided, it is therefore impossible to determine if two clusters were built at the same date or not. Assets are also anonymised: given two clusters C1β and C2β, the cryptocurrency labelled A1β in C1β is not required to match the cryptocurrency labelled by A1β in C2β, A1β of C1β is not even required to be present in C2β.
Input Data Input datasets comprise 29 columns detailed below. Each line is indexed by a unique ID, which corresponds to a cluster (defined by "cluster"), a cluster sample day (defined by "day"), and a cryptocurrency (defined by "asset"). ret1β,....,ret23β correspond to the first 23 hours of the day returns relatively to bitcoin, arranged in chronological order. md and bc are the two secret quantities.
Columns:
id
cluster
day
asset
md
bc
ret_1
...
ret_23


The input data contain a certain number of NaN values corresponding to missing data whose filling method is left to the discretion of participants.
Additional Data: each asset can be seen as a node in a graph that represents its cluster. For this reason, we also provide in the supplementary files a binary adjacency matrix Aβ{0,1}nΓn for each cluster. Each entry Aijβ of this matrix labels the edge between node i and node j. Aijβ indicates whether the co-movement relation between assets i and j is highly statistically significant (Aijβ=1) or not (Aijβ=0). All adjacency matrices are stored in the same pickle file: adjacency_matrices.pkl. This file can be opened with the Python package pickle:
import pickle
with open("adjacency_matrices.pkl", "rb") as file:
adj = pickle.load(file)
adj is a dictionary whose keys are the indices of the clusters.
Output Data Output datasets comprise only 2 columns:
sample_id: a unique sample identifier for a given {cluster,day} pair
target: a float, the mean return of cluster's assets relatively to the bitcoin during the last hour of the day


The 'sample_id' identifier of each sample is computed as follows: sample_id=clusterΓ21+day
Training and test data Number of clusters: 2091, number of days per cluster: 21, total number of samples: 43627β2045Γ21 (some samples were dismissed because of the lack of sufficient data). The training data contains 30494 cluster samples (1464 clusters, β70%), while the test data contains 13133 samples (627 clusters, β30%). Test samples correspond to dates that come after those of the training data.
The solution files submitted by participants shall follow exactly the same output data format as described above, with 'sample_id' identifiers. An example submission file containing random predictions is provided.
The metric used to rank predictions submitted by participants is the Root Mean Squared Error (RMSE).
Benchmark description
For each cluster C={A1β,...,Anβ}, we compute for each asset Aiβ the average of its past returns ret1β,...,ret23β. Next, we compute the average over assets in the cluster of these averages. Mathematically, the benchmark can be described as follows:
import pandas as pd
input_test = pd.read_csv("public/x_test.csv", index_col=0)
input_test["sample_id"] = 21 * input_test["cluster"] + input_test["day"]
ret_cols = ["ret_" + str(i) for i in range(1, 24)]
y_benchmark = pd.DataFrame(index=input_test.groupby("sample_id").mean().index)
y_benchmark["target"] = input_test.loc[:, ["sample_id"] + ret_cols].groupby("sample_id").mean().mean(axis=1)
Files
Files are accessible when logged in and registered to the challenge
The challenge provider
Napoleon Group is building the future of investing around three entities: β’ Napoleon AM has been granted an AIFM license and will be able to offer investment solutions for professional investors. Napoleon AM will hence propose crypto exposure solutions; β’ Napoleon Capital is a specialist in quantitative strategies issued from open-competition holding a financial advisor license CIF (Conseil en Investissement Financier/FSA equivalent) granted by ORIAS/AMF; β’ Napoleon Index is aiming to become registered for index publishing and administration under BMR regulation, blockchain based. Napoleon Group has a French DNA, with an international mindset and a strong will to comply with the highest standard regulations. Napoleon Group is betting on regulation to address institutional investorsβ needs. AM industry evolution: The Group vision is to embrace 3 major developments that are reshaping the AM industry: β’ Quantitative finance is revolutionizing the financial industry through automation and Artificial Intelligence; β’ Blockchain simplifies many operational processes through increased speed, security, transparency and cost-efficiency; and β’ Tokenization is the real potential future of financial assets. Crypto assets are being adopted by institutional clients as a new asset class. This will lead to a world of programmable real and financial assets, allowing to trade in an ever more efficient market.