Challenge Data

Prediction of direction of Bitcoin based on sentiment data
by Napoleon X


Login to your account


Description


NO LOGO FOR THIS CHALLENGE
Competitive challenge
Economic sciences
Finance
Classification
Time series
10MB to 1GB
Basic level

Dates

Started on Jan. 6, 2020


Challenge context

Description of the Company

Napoleon Crypto NaPoleonX is a company specialised in designing quantitative investment solutions, i.e. investment solutions based on algorithms. Napoleon Asset Management NAM is a subsidiary of Napoleon Crypto and is a regulated asset management company specialised in designing quantitative investment solutions with the particularity to focus on crypto currencies as investment universe. In its research to explore new investment solutions, NAM is considering using non-financial data to discover decorrelated families of algorithms.



Challenge goals

Problem description

The problem is a classification challenge that aims at building investment strategies on cryptocurrencies based on sentiment extracted from news and social networks.

For each trading hour we have counted the occurence of some terms like for example "adoption" or "hack" in a selected number of influential twitter accounts as in some forums like Bitcointalk. We have created 10 different themes, some positives and others negatives and we summed the counts of the words corresponding, before normalising them. For a given sample and a given theme we use the counts of each of the 48 last hours, we Z scored these counts, and we multiplied the result by the average hourly count during the period divided by the average hourly count during all the training period. For a theme T in timestamp i, with lag k (k∈[ ⁣[0;47] ⁣]k\in[\![0;47]\!]) the value F of the feature will be:

Fi,k=Ti,kβˆ’Tiβ€Ύ147βˆ‘j=047(Ti,jβˆ’Tiβ€Ύ)2βˆ—Tiβ€ΎTβ€Ύ F_{i,k}=\frac{T_{i,k}-\overline{T_{i}}}{\sqrt{\frac{1}{47}\sum\limits_{j=0}^{47}{(T_{i,j}-\overline{T_{i}})^{2}}}}*\frac{\overline{T_i}}{\overline{T}}

We added 5 features corresponding to the price return of the last hour, the last 6 hours, the last 12 hours, the last 24 hours and the last 48 hours normalised by the volatility during the 48 hours. The aim is to predict if the return of Bitcoin in the next hour will be more than 0.2%, between -0.2% and 0.2%, or less than -0.2%. The 0.2% level is the 66.7% percentile of the distribution.

The metric used for this problem is the logistic loss, defined as the negative log-likelyhood of the true labels given the classifier's predictions. The true labels are encoded as a 1-of-3 binary indicator matrix Y, ie yi,k=1β€²y_{i,k}=1' if sample i has label k taken from a set of 3 labels ( less than -0.2%, between -0.2% and +0.2%, more than 0.2%). For P a matrix of probability estimates with pi,k=Pr(ti,k=1)β€²p_{i,k}=Pr(t_{i,k}=1)', the log loss function is defined as

Llog(Y,P)=βˆ’logPr(Y∣P)=βˆ’1Nβˆ‘i=1Nβˆ‘k=13yi,klog(pi,k) L_{log}(Y,P)=-log{Pr(Y|P)}=-\frac{1}{N} \sum_{i=1}^{N} \sum_{k=1}^3{y_{i,k}log(p_{i,k})}

The lower the score, the better.



Data description

Data description

The Input data contains 10 time series of 48 trading hours representing complementary features based on sentiment analysis from news extracted from twitter or forums like Bitcointalk on Bitcoin, and 5 time series based on the variation of Bitcoin price during the past 1, 6, 12, 24 and 48 hours normalised by volatility during the period. Input data, for training and testing, will be given by a .csv file, whose first line contains the header. Then each line corresponds to a sample, each column to a feature. The features are the following:

  • ID: Id of the sample which is linked to the ID of the output file;
  • I_1_lag(k) to I_10_lag(k): Values of Indicators I_1 to I_10 for each k lag (k∈[ ⁣[0;47] ⁣]k\in[\![0;47]\!]) representing the normalized value of Indicators I_1 to I_10 each hour of the past 48 trading hours;
  • X_1 to X_5: Values of 5 normalised indicators representing price variation of Bitcoin on the last 1, 6, 12, 24 and 48 hours.

There will be 14 000 samples for the train set and 5 000 for the test set. For a given sample, the time series (for the 10 sentiment indicators) are given over the same 48 trading hours.

The training outputs are given in a .csv file. Each line corresponds to a sample:

  • ID: Id of the sample;
  • Target_-1: classification of the return of Bitcoin in the next hour. -1 signifies a down move of less than -0.2%;
  • Target_0: classification of the return of Bitcoin in the next hour. 0 signifies a move between -0.2% and 0.2%;
  • Target_1: classification of the return of Bitcoin in the next hour. 1 signifies a up move of more than 0.2%.
ID Target -1 Target 0 Target 1
0000 0 0 1
0001 1 0 0
0002 0 0 1
: : : :
13999 0 1 0


Benchmark description

Benchmark

We have chosen as benchmark a logistic regression on features X_1 to X_5.



Files


Files are accessible when logged in and registered to the challenge


The challenge provider


PROVIDER LOGO

Napoleon Group is building the future of investing around three entities: β€’ Napoleon AM has been granted an AIFM license and will be able to offer investment solutions for professional investors. Napoleon AM will hence propose crypto exposure solutions; β€’ Napoleon Capital is a specialist in quantitative strategies issued from open-competition holding a financial advisor license CIF (Conseil en Investissement Financier/FSA equivalent) granted by ORIAS/AMF; β€’ Napoleon Index is aiming to become registered for index publishing and administration under BMR regulation, blockchain based. Napoleon Group has a French DNA, with an international mindset and a strong will to comply with the highest standard regulations. Napoleon Group is betting on regulation to address institutional investors’ needs. AM industry evolution: The Group vision is to embrace 3 major developments that are reshaping the AM industry: β€’ Quantitative finance is revolutionizing the financial industry through automation and Artificial Intelligence; β€’ Blockchain simplifies many operational processes through increased speed, security, transparency and cost-efficiency; and β€’ Tokenization is the real potential future of financial assets. Crypto assets are being adopted by institutional clients as a new asset class. This will lead to a world of programmable real and financial assets, allowing to trade in an ever more efficient market.


Congratulation for the winners of the challenge


1 -
2 Romain Poncet
3 Christophe Leroux

You can find the whole list of winners of the season here