Review of deep learning models for crypto price prediction: implementation and evaluation (2024)

Jingyang WuXinyi ZhangFangyixuan HuangHaochen ZhouRohtiash Chandra

Abstract

There has been much interest in accurate cryptocurrency price forecast models by investors and researchers. Deep Learning models are prominent machine learning techniques that have transformed various fields and have shown potential for finance and economics. Although various deep learning models have been explored for cryptocurrency price forecasting, it is not clear which models are suitable due to high market volatility. In this study, we review the literature about deep learning for cryptocurrency price forecasting and evaluate novel deep learning models for cryptocurrency stock price prediction. Our deep learning models include variants of long short-term memory (LSTM) recurrent neural networks, variants of convolutional neural networks (CNNs), and the Transformer model. We evaluate univariate and multivariate approaches for multi-step ahead predicting of cryptocurrencies close-price. Our results show that the univariate LSTM model variants perform best for cryptocurrency predictions. We also carry out volatility analysis on the four cryptocurrencies which reveals significant fluctuations in their prices throughout the COVID-19 pandemic. Additionally, we investigate the prediction accuracy of two scenarios identified by different training sets for the models. First, we use the pre-COVID-19 datasets to model cryptocurrency close-price forecasting during the early period of COVID-19. Secondly, we utilise data from the COVID-19 period to predict prices for 2023 to 2024.

keywords:

cryptocurrency , deep learning , time series prediciton

PACS:

0000 , 1111

MSC:

0000 , 1111

^†^†journal: .

\affiliation

[inst1]UNSW Sydney, Sydney, Australia

1 Introduction

The traditional financial ecosystem is implemented through a complex set of policies and structural mechanisms that financial institutions utilise to engender currency within an economy [1]. The core of this ecosystem is the central bank, treasury, and commercial banking entities which are classified under three primary monetary frameworks: commodity-based [2], commodity-backed [3], and fiat currency systems [4]. Triggered by the flaws in these institutions such as inflationary propensities and transactional inefficiencies [5], the digitisation of currency has become a revolution [6]. Cryptocurrencies aim to rectify the existing system imperfections [7], such as inflation, financial stability, transactional efficiency, and reduced operational costs. A cryptocurrency is a peer-to-peer digital exchange system where cryptographic techniques are employed to create and distribute units of currency among participants [8, 9]. The cryptocurrency market has seen rapid and unpredictable changes over its relatively brief existence [10]. The security of the cryptocurrency market is ensured by a technology called blockchain [11], which provides a comprehensive security. In the present year (2024), there are over 5,000 cryptocurrencies and 5.8 million active users in the cryptocurrency industry [12]. Due to its inherent nature of mixing cryptography with a monetary unit, Bitcoin (BTC) became one of the most popular cryptocurrency and received attention in fields such as computer science, economics and cryptography [13]. Satoshi Nakamoto pseudonymously introduced Bitcoin and released it as an open source software in January 2009 [8]. The cryptocurrency ecosystem encompassing Bitcoin and Altcoins with tokens such as Civic and BitDegree, marks a significant stride towards a decentralized financial system. The cryptocurrency ecosystem refers to the broader infrastructure and community that encompasses various cryptocurrencies and blockchain projects, whereas a ”token” such as BitDegree and Civic, serves specific functions within these ecosystems, often facilitating access to services or representing certain assets. Unlike Bitcoin, which is primarily a digital currency intended for transactions and value storage, BitDegree serves a distinct purpose by focusing on education, offerring tokens as incentives within its educational platform. Nevertheless, due to its decentralised nature and absence of governmental support, the cryptocurrency market is susceptible to significant fluctuations in value and the formation of pricing bubbles [14].

The inherent volatility of cryptocurrencies featuring transaction volume fluctuations and price variability, complicates the predictive analysis of cryptocurrency prices [15]. However, volatility [16] makes it a profitable market for speculation as the sourse of potencial gain. The prominent cryptocurrencies such as Bitcoin (BTC), Ethereum (ETH), and Litecoin (LTC) differ based on valuation, transaction speed, usage, and volatility [17]. Identifying the precise catalysts for these price trends in the cryptocurrency domain remains elusive due to the sector’s pronounced volatility. Nevertheless, the market value of cryptocurrencies is projected to increase in the future, with an expected compound annual growth rate of 11.1% [18]. Meanwhile, the financial audit sector is evolving to integrate cryptocurrencies as a valid transaction medium. Investors have encountered challenges in previous instances due to price bubbles resulting in extreme fluctuations [19]. In order to surmount these obstacles, it is imperative to have a dependable model that can aid market participants in identifying trends and generating accurate predictions. Predicting cryptocurrency prices with precision is difficult due to its sensitivity to multiple factors, including government policies, technology advancements, public perception, and world events [20]. Muarry et al. [21] highlights the inherent difficulties in predicting the pricing of cryptocurrencies because of their high volatility, decentralised nature, and other distinctive features such as transaction speed and variations in their ecosystems.

Several researchers are affirming the correlation between cryptocurrencies and other domains such as the economics, finance, the internet, and even politics. Wang et al. [22] presented an analysis using machine learning models and revealed a strong correlation between cryptocurrencies and their intrinsic features (e.g., lagged volatility, previous trading information). Kyriazis [23] studied spillover effects in cryptocurrency markets, emphasising Bitcoin’s role using statistical models such as vector autoregression (VAR) [24] and generalized autoregressive conditional heteroskedasticity (GARCH) [25] to explain inter-market dynamics. Huynh et al. [26] revealed that gold can be used as a reliable tool to reduce the risk associated with unpredictable changes in the cryptocurrency market when utilized as a separate form of currency. However, investors are enthusiastic and also cautious due to the highly volatile cryptocurrency market. Machine learning, along with deep learning models are promising for cryptocurrency due to prediction capabilities and the ability to model multimodal [27], spatiotemporal data [28], and time series forecasting [29].

Machine learning and deep learning models have shown great potential in temporal forecasting problems for various domains, such as climate extremes [30], energy [31], and financial time series [32]. Deep learning models can assist in forecasting future cryptocurrency prices, although there are challenges due to nonlinear and volatile nature of the time series. Many researchers are keen to use long short-term memory (LSTM) and its variants to predict cryptocurrencies [33, 34, 35]. Deep learning methods such as LSTM recurrent neural networks [36, 37], convolutional neural networks (CNN) [38], and Transformer models [39] are also promising for predicting cryptocurrencies. Chandra et al. [40] led a comparative analysis of various deep learning models for multi-step forward time series prediction. A myriad of factors, both internal and external, such as the trading volume, market beta, and volatility, play a critical role in determining cryptocurrency value. Therefore, we need to utilise cryptocurrencies that are highly correlated for deep learning models and access univariate and multivariate deep learning models.

In this paper, we provide a detailed review of the literature on crypto-price forecasting using deep learning models and then evaluate novel deep learning models for cryptocurrency price forecasting. Specifically, we utilise variants of long short-term memory (LSTM) recurrent neural networks, variants of convolutional neural networks (CNNs), and the Transformer model. We evaluate univariate and multivariate approaches for multi-step ahead predicting of cryptocurrencies close-price. Our results show that the univariate LSTM model variants perform best for cryptocurrency predictions. We also carry out volatility analysis on the four cryptocurrencies and investigate the prediction accuracy of two scenarios identified by different training sets for the models. First, we use the pre-COVID-19 datasets to model cryptocurrency close-price forecasting during the early period of COVID-19. Secondly, we utilise data from the COVID-19 period to predict prices for 2023 to 2024. We investigate the effect of univariate and multivariate models, where the multivariate model features the Gold price, close, open, and high price of the crypto being predicted and a highlighted correlated crypto price index.

The rest of the paper is organised as follows. Chapter 2 provides a comprehensive overview and analysis of previous research and literature relating to the topic. Chapter LABEL:ch3 provides the framework that compares selected deep learning models. Chapter 4 presents the experiments and results. Chapter 5 presents a discussion, and Chapter 6 concludes the paper.

2 Review

Forecasting financial time series is highly favoured by researchers in both academic and financial sectors due to its wide applications and significant influence. Machine learning and deep learning have paved the way for numerous models, leading to a large body of published research. Among these areas of interest, cryptocurrency price prediction stands out. This section offers an in-depth overview of how machine learning and deep learning are applied to financial time series forecasting, especially for predicting cryptocurrency prices, without using complex terminology.

2.1 Financial time series prediction

Financial time series forecasting had an emphasis on predicting asset prices [41]. Although there are diverse methodologies, the key focus has been on predicting the future movements of the underlying asset with deep learning models [42]. This field covers a variety of subjects including forecasting of stock prices, index prediction, forex price prediction, as well as predictions for commodities (such as oil and gold), bond prices, volatility, and cryptocurrency prices [43]. Despite the wide range of topics, the underlying principles applied in these forecasts remain uniformly applicable across all categories.

Research within financial time series forecasting is broadly segregated into two categories based on precise price forecasting and trend (directional movement) forecasting [44]. Although exact price prediction aligns with regression tasks, the primary goal in numerous financial forecasting projects is not the accurate prediction of prices, but rather the correct identification of price movement direction. This shifts the emphasis towards trend prediction, or determining the directional change in prices, marking it as a more critical area of investigation compared to pinpoint price forecasting. Hence, trend prediction is approached as a classification issue. Some analyses focus on binary outcomes, addressing only upward/downward movements [45], while others incorporate a third class (neutral option), thus constituting a 3-class problem [46].

In recent years, researchers have utilised machine learning and deep learning for the analysis of financial time series data. Nabipour et al. [45] conducted a comparative analysis of deep learning models (simple recurrent neural network (RNNs) [47] and LSTM networks [36]) with machine learning models for stock market trend prediction, demonstrating the superior accuracy of deep learning. Mehtab et al. [48] enhanced NIFTY-50 ¹¹1https://www.nseindia.com/ Indian stock index prediction using LSTM models with the grid-searching and walk-forward validation and achieved notable accuracy. NIFTY-50 represents the weighted average of 50 of the top companies listed on the National Stock Exchange (NSE) of India. Rezaei et al. [49] combined deep learning with frequency decomposition methods, including empirical mode decomposition (EMD) [50], and complete ensemble empirical mode decomposition (CEEMD) [51] to predict stock prices and demonstrated effectiveness of CEEMD-CNN-LSTM and EMD-CNN-LSTM. Jing et al. [52] developed a hybrid model that merges deep learning with investor sentiment analysis, utilising CNN for sentiment classification and LSTM for stock price prediction, demonstrating enhanced predictive accuracy for stock prices. Mehtab and Sen [53] used a blend of machine learning and deep learning models with walk-forward validation and grid-search technique for precise short-term forecasting rather than long-term trends of NIFTY-50, offering valuable insights for short-term traders. Li and Pan [54] enhanced stock price prediction accuracy by employing an ensemble deep learning model that leveraged stock prices and news data, using LSTM and gated recurrent unit (GRU) networks. Kanwal et al. [55] introduced a hybrid deep learning model combining bidirectional LSTM and one-dimensional CNN for stock price prediction, achieving higher accuracy and efficiency on five distinct stock datasets. Swathi et al. [56] presented a novel model for stock price prediction, leveraging Twitter sentiment analysis with an impressive accuracy of 94.73%, showcasing its effectiveness over traditional and other deep learning methods. Ben Ameur et al. [57] utilized deep learning models (LSTM, GRU, RNN, and CNNs) to forecast commodity prices for the Bloomberg Index, demonstrating LSTM models superior performance. Baser et al. [58] evaluated gold price prediction using tree-based models, including Decision Trees, AdaBoost, Random Forest, Gradient Boosting, and XGBoost. They demonstrated XGBoosts superior accuracy through technical indicators analysis. Deepa et al. [59] used statistical and machine learning models for prediction of cotton prices in India and reported that boosted decision tree regression provided the highest accuracy. Zhao and Yang [60] proposed an integrated deep learning framework for stock price movement prediction, which combined sentiment analysis with deep learning models and got enhanced prediction accuracy by incorporating both market data and investor sentiment.

Table 1 provides a list of sample studies focused on using traditional statistical and machine learning methods to predict cryptocurrency trends. We report various models with error measures such as the mean absolute error (MAE), mean absolute percentage error (MAPE) and root mean squared error(RMSE). We also mention the time periods of data used in these literatures.

Methods	Data	Targetpredictor	Time range(month/day/year)	Metric
LSTMgrid-search[48]	NIFTY 50 index	Priceprediction	12/29/2014-07/31/2020	RMSE
MLP,RNN,LSTM[45]	Stock market trends	Trendprediction	11/01/2009-11/01/2019	F1-ScoreAccuracyROC-AUC
LSTM, CNN,empirical modedecomposition,CEEMD[49]	Stock prices	Priceprediction	01/01/2010-09/01/2019	RMSEMAEMAPE
CNN, LSTM[52]	Stock prices	Priceprediction	01/01/2017-07/01/2019	MAPE
Linear Regression,Bagging, XGBoostRanform ForestsMLP, SVM, LSTM[53]	NSE stock prices	Short-term priceprediction	-	Comparativeanalysis
LSTM withSentimentAnalysis[54]	Stock prices	PricePrediction	12/31/2017-06/01/2018	MSEPrecisionRecallF1-Score
BD-LSTM,1D-CNN[55]	Stock prices	Priceprediction	01/01/2000-12/31/2020	AccuracyEfficiency
TeachingLearning BasedOptimizationLSTM[56]	Stock prices	Priceprediction	-	AccuracyPrecisionRecallF1-Score
LSTM,Gated Recurrent Units,RNN, CNN[57]	BloombergCommodityIndex	Priceprediction	01/01/2002-12/31/2020	Accuracy
Decision Tree,AdaBoost,Random Forests,Gradient Boosting, XGBoost[58]	Gold prices	Priceprediction	11/18/2011-01/01/2019	MAEMSERMSER2 Score
Logistic Regression,Bayesian Linear Regression,Boosted DecisionTree Regression,Random Forest Regression,Poisson Regression[59]	Agriculturematerialprices	Priceprediction	-	MAERMSERAER square
LSTM, Ensemble CNN,Denoising Autoencoder,Sentiment Analysis[60]	Stock pricesand sentiment	Pricemovement	01/01/2002-12/31/2020	RMSE

2.2 Cryptocurrency prediction

Some researchers have employed machine learning models, such as simple neural networks (SNN) also known as backpropagation and artificial neural networks [61], support vector machines (SVM) [62], genetic algorithm-based SNN [63], and neuroevolution of augmenting topologies (NEAT) [64] which evolves both architecture and neural network parameters.

Next, we review some of the machine learning models that are pivotal in predicting cryptocurrency prices. Greaves and Au [65] demonstrated the superiority of neural networks over linear regression, logistic regression, and support vector machines (SVM) [66] for Bitcoin price prediction. Sovbetov [67] examined the effect of market factors by using autoregressive distributed lag (ARDL) and the S&P50 Index on various cryptocurrencies. Guo et al. [68] improved short-term Bitcoin volatility forecasting with temporal mixture models, outperforming traditional methods. Akcora et al. [69] investigated the predictive Granger causality of chainlets and identify certain types of chainlets that exhibit the highest predictive influence on Bitcoin price and investment risk. Roy et al. [70] used ARIMA, Autoregressive, and Moving Average models in forecasting short-run volatility in Bitcoin’s weighted costs. Derbentsev et al. [71] compared binary autoregressive tree (BART), ARIMA, and autoregressive fractional integrated moving average (ARFIMA) models for forecasting Bitcoin, Ethereum, and Ripple prices where BART had best accuracy. Kumar et al. [72] and Latif et al. [73] examine the effectiveness of LSTM and ARIMA models in the short-term prediction of BTC prices, demonstrating that while ARIMA models can capture the general trend, LSTM models excel in predicting both the direction and magnitude of price movements, highlighting the potential of deep learning in financial market predictions. Maleki et al. [74] used machine learning models including linear regression, gradient boosting regressor(GBR), support vector regressor(SVR), random forest regressor (RFR) and ARIMA in predicting Bitcoin prices, suggesting new investment strategies in the cryptocurrency market.

Table 2 provides a list of studies focused on using traditional statistical and machine learning methods to predict cryptocurrency trends. The table reports metrics such as the mean absolute error (MAE), mean absolute percentage error (MAPE) and root mean squared error(RMSE). We also mention the time periods of data used in these papers.

Methods	Cryptocurrency(type)	Targetpredictor	Time range(month/day/year)	Metric
Linear regression,Logistic regression,Neural Networks, SVM[65]	BTC	Future price	prior-07/04/2013	MSEAccuracy
AutoregressiveDistributed Lag[67]	BTCETHDashLTCMonero	Short-Longterm price	01/01/2010-01/12/018	ADFtest price
Temporalmixture models[68]	BTC	Short-termvolatility	09/01/2015-04/01/2017	RMSEMAE
k-Chainlets[69]	BTC	Close Price	01/01/2009-01/01/2018	RMSEwallet gain
ARIMA,Autoregressive,Movingaverage[70]	BTC	Market price	07/31/2013-08/01/2017	RMSE
BART,ARIMA,ARFIMA[71]	BTCRippleETH	Short-termprice	01/01/2017-03/01/2019	RMSE
ARIMA,LSTM[72]	ETH	Close Price	01/01/2016-12/31/2021	MSE
ARIMA,LSTM[73]	BTC	Short-termprice	12/21/2020-12/21/2021	MAPEMAERMSE
Logistic Regression,Gradient boosting regressor,SVR,Random forest regressor,ARIMA[74]	BTCETHZECLTC	Close price	04/01/2018-03/31/2019	MSEMAPEMAEAICBIC

2.3 Deep learning models for cryptocurrency prediction

In recent years, deep learning models have been prominent in the prediction of cryptocurrencies, as follows. Jiang and Liang [38] combined CNNs with reinforcement learning [75] for portfolio management utilising historical cryptocurrency pricing data to allocate assets optimally within specified portfolio constraints. Wu et al. [35] improved Bitcoin prediction accuracy by using autoregressive characteristics in an LSTM network, outperforming standard LSTM. Lee et al. [76] introduced a novel approach employing inverse reinforcement learning coupled with agent-based modeling for Bitcoin price prediction. Ly et al. [77] employed LSTM networks to predict Bitcoin trends, demonstrating the models’ capability to forecast price changes and classify market movements with varying degrees of accuracy. Saad et al. [13] found LSTM to be the most accurate in forecasting Bitcoin prices compared to various machine learning modelsPatel et al. Lucarelli and Borrotti [78] investigated automated cryptocurrency trading using deep reinforcement learning, employing double deep Q-learning networks trained by Sharpe ratio rewards, which outperformed traditional models in Bitcoin trading. Lahmiri and Bekiros [79] compared LSTM networks with generalized regression neural networks (GRNN) to forecast cryptocurrency prices, revealing the chaotic dynamics and fractality in digital currencies’ time series. Patel et al. [80] introduced a hybrid LSTM with gated recurrent unit model for Litecoin and Monero and achieved more accuracy than a simple LSTM model. Livieris et al. [33] combined deep learning and ensemble learning to forecast trends and prices of Bitcoin, Ethereum, and Ripple. LSTM, bidirectional LSTM, and CNN models demonstrated the capability to deliver precise and dependable predictions. Marne et al. [81] used RNN and LSTM models to predict Bitcoin prices that showed better results than machine learning models. Nasekin and Chen [82] analysed cryptocurrency investor sentiment using CNN for sentiment classification and index construction from StockTwits messages. Sridhar and Sanagavarapu [39] employed a Transformer model for Dogecoin price prediction demonstrating the model’s capability to capture both short-term and long-term dependencies effectively. Betancourt and Chen [83] propose the utilization of deep reinforcement learning (DRL) [84] for the dynamic management of cryptocurrency asset portfolios, accommodating portfolios comprising an evolving number of cryptocurrency assets. Shahbazi and Byun [85] applied reinforcement learning for forecasting Litecoin and Monero market values. D’Amato et al. [86] employed a Jordan RNN to enhance the prediction of cryptocurrency volatility, demonstrating superior accuracy over traditional machine learning models for Bitcoin, Ripple, and Ethereum. Schnaubelt [87] applied reinforcement learning to develop cryptocurrency trading strategies. Parekh et al. [88] combined LSTM and sentiment analysis to predict cryptocurrency prices. The study integrated market sentiments from social media for enhanced forecasting accuracy. Kim et al. [89] applied a self-attention-based multiple LSTM model and improved the prediction accuracy for Bitcoin. Goutte et al. [90] used LSTM networks with technical analysis to enhance cryptocurrency trading strategies, particularly focusing on Bitcoin.

Table 3 provides an overview of sample research that focuses on applying deep learning techniques to forecast the trend of cryptocurrencies. The table reports metrics such as the mean absolute error (MAE), mean absolute percentage error (MAPE) and root mean squared error(RMSE). We also mention the time periods of data used in these papers.

Deep learningtechniques	Cryptocurrency(type)	Targetpredictor	Time range(month/day/year)	Metric
CNN,Reinforcement Learning[38]	BTCETHXRP	Close Price	01/01/2018-02/28/2019	RMSEAccuracyAUCF1
LSTM withautoregressivecharacteristics[35]	BTC	Short-Longterm price	01/01/2018-07/28/2018	MSERMSEMAPE
inverse ReinforcementLearningAgent-based Model[76]	BTC	Close Price	09/01/2016-07/31/2017	MAEMSERMSEMAPE
RNN, LSTM[77]	BTC	TrendPrediction	-	-
Reinforcement Learning,LSTM,Conjugate Gradient[13]	BTCETC	Close price	10/01/2015-05/01/2018	RMSEMAE
Deep ReinforecemtLearning[78]	BTC	Trading	-	Profit-basedMetrics
ChaoticNeuralNetworks[79]	BTCDASHXRP	PriceForecasting	07/16/2010-10/01/2018	-
LSTM withGated Recurrent Units[80]	LTCMonero	Short-Longterm price	30/01/2015-02/23/2020	Accuracy
LSTM,BD-LSTM,CNN[33]	BTCETHXRP	Short-Longterm price	01/01/2018-08/31/2019	MSERMSEMAPE
RNN, LSTM[81]	BTC	Close Price	01/01/2014-01/31/2019	RMSE
CNN[82]	Various	SentimentAnalysis	03/01/2013-05/31/2018	-
Transformer[39]	DOGE	Close Price	07/05/2019-04/28/2021	AccuracyR-squared
Deep ReinformentLearning[83]	Various	PortfolioManagement	08/17/2017-11/01/2019	Total ReturnSharpe Ratio
Reinforment Learning[85]	LTCMonero	Price Prediction	2016-2020	MAEMSERMSEMAPE
Jordan RNN[86]	BTCXRPETH	Volatility	04/28/2013-12/15/2019	MSEMAPE
Deep ReinforcementLearning[87]	Various	LimitOrderPlacement	01/01/2018-06/30/2019	Total ReturnSharpe Ratio
LSTMsentiment analysis[88]	DashBTC-Cash	PricePrediction	-	MSEMAEMAPE

2.4 Cryptocurrency volatility and prediction

Several researchers concentrate on analyzing and predicting the volatility of cryptocurrencies. Volatility in the cryptocurrencies market is a significant factor that influences numerous decisions in business and finance [91]. Recently, there has been identification of volatility spillovers between the cryptocurrency market and other financial markets [86]. Katsiampa [16] employed an Asymmetric Diagonal BEKK model to examine the volatility dynamics in the cryptocurrency market, revealing significant interdependencies and responsiveness to major news in the volatility levels of major cryptocurrencies such as Bitcoin, Ether, Ripple, Litecoin, and Stellar Lumen. Woebbeking [92] developed the CVX index using a model-free approach derived from cryptocurrency option prices, unveiling that cryptocurrency volatility often diverges from traditional financial markets, and is distinctly reactive to major market events. Yen and Cheng [91] utilized stochastic volatility models to analyze the impact of the Economic Policy Uncertainty (EPU) index on cryptocurrency volatility, finding that China’s EPU uniquely predicts the volatility of Bitcoin and Litecoin, suggesting these cryptocurrencies might serve as hedging tools against EPU risks. Cross, Hou, and Trinh [93] utilized a time-varying parameter model to explore the returns and volatility of cryptocurrencies during the 2017–18 bubble, highlighting a significant risk premium effect in Litecoin and Ripple and identifying adverse news effects as key drivers of the 2018 crash across Bitcoin, Ethereum, Litecoin, and Ripple. Ftiti, Louhichi, and Ben Ameur [94] utilized heterogeneous autoregressive (HAR) models with high-frequency data to explore cryptocurrency volatility during the COVID-19 pandemic. Their findings underscore the predictive superiority of models incorporating both positive and negative semi-variances, especially during the crisis, suggesting these models can effectively capture the asymmetric dynamics of market volatility. Yin, Nie, and Han [95] applied the Generalized Autoregressive Conditional Heteroskedasticity - Mixed Data Sampling (GARCH-MIDAS) model to explore the influence of oil market shocks on the volatility of Bitcoin, Ethereum, and Ripple. Their analysis revealed that oil market shocks, both supply and demand types, significantly affect the long-term volatility of these cryptocurrencies, thereby suggesting potential hedging capabilities against oil-induced economic uncertainties.

There is also some research on the prediction of cryptocurrency volatility. Catania, Grassi, and Ravazzolo [96] employed a score-driven Generalized Hyperbolic Skew Student’s t (GHSKT) model to analyze and predict the volatility of Bitcoin, Ethereum, Litecoin, and Ripple. They demonstrated that accounting for long memory and asymmetric reactions to past shocks enhances the model’s predictive accuracy significantly across various forecast horizons. Catania and Grassi [97] further employed the GHSKT model demonstrating that the model’s ability to incorporate higher-order moments and leverage effects significantly enhances the accuracy of volatility forecasts across various cryptocurrencies. Ma et al. [98] employed the Markov Regime-Switching Mixed Data Sampling (MRS-MIDAS) model to forecast cryptocurrency volatility, particularly focusing on Bitcoin. They enhanced the standard MIDAS approach by incorporating jump-driven time-varying transition probabilities, which allowed the model to capture dynamic changes in volatility states influenced by market jumps.

Table 4 provides an overview of sample research that focuses on cryptocurrency volatility and prediction. The table reports metrics such as the mean absolute error (MAE), mean absolute percentage error (MAPE) and root mean squared error(RMSE). We also mention the time periods of data used in these papers.

Methods	Data	Targetpredictor	Time range(month/day/year)	Metric
Asymmetric Diagonal,BEKK model[16]	BTCETHXRPLTCXLM	Volatilitydynamics	08/07/2015-02/10/2018	Past squared errors,past conditional volatility
Model-free volatility,CVX index[92]	BTC	Volatilitydynamics	02/06/2020-07/06/2021	Not specified
Stochastic volatilitymodels[91]	BTCLTC	EPU impact oncryptocurrencyvolatility	02/2014-06/2019	Not specified
Time-varyingparameterstochastic volatilitymodel[93]	BTCETHLTCXRP	Returns andvolatility dynamics	01/2017-01/2019	Forecast accuracyMSFEALPL
Heterogeneousautoregressivemodels[94]	BTCETHETCXRP	Volatilityforecasting	04/01/2018-06/30/2020	MSEMAEMAPE
GARCH-MIDASmodel[95]	BTCETHXRP	Impact of oilmarket shockson volatility	04/28/2013-12/31/2018	MAEMAPERMSE
Score-drivenGeneralized HyperbolicSkew Student’s tmodel[96]	BTCETHLTCXRP	Volatilityforecasting	04/29/2013-12/01/2017	Quasi-Likeloss function
Score-drivenGeneralized HyperbolicSkew Student’s tmodel[97]	BTCETHLTCXRP	Volatilityforecasting	-	MSEMAEMAPE
MarkovRegime-Switching,Mixed DataSampling model[98]	BTC	Volatilityforecasting	03/01/2013-09/29/2018	Quasi-Likeloss functionMSEMAE
GARCH-MIDASmodel[99]	GoldSilver	Cryptocurrencyuncertainty impacton precious metalvolatility	01/02/2014-05/13/2022	Diebold-Mariano testR-squareModel Confidence Set testDirection-of-Change rate test

3 Methodology: Implementation and Evaluation

3.1 Conventional models

3.1.1 ARIMA

The ARIMA model, often known as the Box-Jenkins model [100], is a commonly used statistical/econometric model for forecasting time series data. The ARIMA model consists of three components: autoregressive (AR), integrated (I), and moving average (MA). The integrated component represents the amount of differencing required to transform the series data into a stationary representation. The autoregressive component describes the relationship between the present value of a time series and its previous values, capturing their correlation. The moving average component indicates the correlation between the current observation and its previous error term. This component assists the model in capturing stochastic variations in the time series. The three components constitute the three parameters $p$ , $d$ , and $q$ in the model. $p$ represents the number of lag observations in the autoregressive part. $d$ is the order of differencing, which forms the integrated part, and $q$ is the number of lagged forecast errors in the moving average component.

3.1.2 Multilayer perceptron

A simple neural network, also known as the multilayer perceptron is a machine learning model that features an input layer, an output layer and at least one hidden layer. Figure 1 illustrates the architecture of the MLP. The MLP need to use a training algorithm to update the weights and biases to ensure that the output (prediction) of the network resembles the actual observations (training data). The network computes the weight sum of inputs to get the hidden and output layers by

\begin{split}h_{W,b}(x)&=f(Wx+b)\\&=f(\sum_{i=1}^{n}W_{i}x_{i}+b)\end{split}

(1)

where $x$ is the input item, $f(\cdot)$ is the activation function, $b$ is the bias, $n$ is the number of input units and $w$ is the weight.

Review of deep learning models for crypto price prediction: implementation and evaluation (1)

3.2 Deep learning models

3.2.1 Variants of LSTM networks

RNNs are well-known for modelling temporal sequences, which are distinguished by their context layers as they memory information from prior input to influence the future results. There are several simple RNN architectures, such as the Elman RNN [47] (also known as simple RNN) which was one of the earliest attempts for effectively modelling temporal sequences. Figure 2 gives architecture of the Elman RNN. There are trainable weights connecting each two adjacent layers. A context (state or memory) layer is used to store the output of state neurons resulting from the computation of previous time steps, making them appropriate for capturing time-varying patterns in data.

Review of deep learning models for crypto price prediction: implementation and evaluation (2)

However, simple RNNs faced problems in training due to the vanishing gradient problem [101] arising when handling long-term dependencies in sequence data. The LSTM algorithm is considered to be an enhanced version of the RNN [36]. The LSTM overcame the vanishing gradient constraint by enhancing its ability to retain long-term dependencies through memory cells in the hidden layer. We present the architecture of LSTM network in Figure 3 showing how the information is passed through LSTM memory cells in the hidden layer. The LSTM cell is designed as a unit that memorises each input information for a long time, where previous information can still be retained, and hence addressing the problem of learning long-term dependencies in sequence data. The LSTM cell calculates a hidden state output $h_{t}$ by

\begin{split}f_{t}&=\sigma(W_{f}[h_{t-1},x_{t}]+b_{f})\\i_{t}&=\sigma(W_{i}[h_{t-1},x_{t}]+b_{i})\\o_{t}&=\sigma(W_{o}[h_{t-1},x_{t}]+b_{o})\\z&=tanh(W_{z}[h_{t-1},x_{t}]+b_{z})\\C_{t}&=f*C_{t-1}+i_{t}*z\\h_{t}&=0_{t}*tanh(C_{t})\\\end{split}

(2)

where $f_{t}$ , $i_{t}$ and $o_{t}$ refer to the forget gate, input gate and output gate respectively. $W$ is weight matrices adjusted learning along with $b$ , which is the bias. $x_{t}$ is the number of input features, and $h_{t}$ is the number of hidden units. $z$ express as intermediate cell state, and $C_{t}$ is the current cell memory.

3.2.2 Convolutional neural networks

CNNs are one of the most prominent deep learning models initially designed for computer vision and image processing tasks [107, 108, 109, 110]. Their application spans diverse areas, notably in detection [111] and segmentation tasks [112], where they have shown superior efficacy accuracy when compared to traditional machine learning models. A CNN typically comprises several layers, including convolutional, pooling, and fully connected layers.

Subsequently, the fully connected layer, akin to conventional neural networks, ensures a dense interconnection between the nodes of consecutive layers. CNNs identifies hierarchical patterns (features) in the data through iterative convolution and pooling, culminating in a fully connected layer that consolidates these features for the final task output. This structural design has been pivotal for their proficiency in handling tasks related to image processing.

Given that our dataset consists of univariate time series for stock price, it’s crucial to modify conventional two-diremntional CNN to suit our problem. Consequently, we’ve integrated a specialized function into our model that processes a set of stock data inputs, along with specifications like filter count, filter width, and stride length, while the kernel’s height remains irrelevant. This function initializes filter values using a Gaussian distribution and sets biases to zero. It generates several matrices as outputs, where their quantity corresponds to the filter count. These matrices are crucial as they contribute to feature extraction within the CNN model, ultimately serving as inputs for the subsequent pooling layer following the activation function’s execution. We note that in the case of multivariate time series data, the conventional 2D-CNN can be utilised.

The activation function is essential to optimise models performance. The activation functions such as hyperbolic tangent (Tanh), rectifier linear units (ReLU), Sigmoid, and leaky ReLU are typically employed in CNNs. We opt for ReLU and Leaky ReLU which are prominent in the literature and also have ability to avoid vanishing gradients as given below.

	$\displaystyle\text{ReLU}(z)$	$\displaystyle=\begin{cases}0&\text{if }z\leq 0,\\z&\text{if }z>0,\end{cases}$		(3)
	$\displaystyle\text{Leaky-ReLU}(z_{i})$	$\displaystyle=\begin{cases}\alpha_{i}z_{i}&\text{if }z\leq 0,\\z_{i}&\text{if }z>0,\end{cases}$		(3)

where $z$ and $z_{i}$ are convolution outcomes, and $\alpha_{i}$ is user-defined hyperparameter for convolutional layer $i$ , typically starting at 0.01. We select ReLU for the initial convolutional layer to address the vanishing gradient issue, followed by Leaky-ReLU in subsequent layers as shown in Figure 6.We train the CNN model by minimising the error defined by the loss function using the Adam optimiser [113] with user-defined learning rate $\lambda=0.0001$ .

Review of deep learning models for crypto price prediction: implementation and evaluation (6)

3.2.3 Convolutional LSTM networks

Convolutional LSTM (Conv-LSTM) network [114] was initially introduced for weather forecasting problems. This network extends the original fully connected LSTM and changes the matrix multiplication of the LSTM cell to convolution. We use $*$ to present convolution operation. And $\circ$ recognised as the Hadamard product. The key equations in the Conv-LSTM cell are expressed as :

\begin{split}f_{t}&=\sigma(W_{xf}*x_{t}+W_{hf}*h_{t-1}+W_{cf}\circ c_{t-1}+b_{%f})\\i_{t}&=\sigma(W_{xi}*x_{t}+W_{hi}*h_{t-1}+W_{ci}\circ c_{t-1}+b_{i})\\o_{t}&=\sigma(W_{xo}*x_{t}+W_{ho}*h_{t-1}+W_{co}\circ c_{t}+b_{o})\\c_{t}&=f_{t}\circ c_{t-1}+i_{t}\circ\tanh(W_{xc}*x_{t}+W_{hc}*h_{t-1}+b_{c})\\h_{t}&=0_{t}\circ\tanh(C_{t})\\\end{split}

(4)

where $f_{t}$ , $i_{t}$ , $o_{t}$ and $h_{t}$ refer to the forget gate, input gate, output gate and hidden state respectively. $W$ is weight matrices adjusted learning along with $b$ , which is the bias. Also, the past status $c_{t-1}$ can be regarded as “forgotten” in the process, and $c_{t}$ is the current cell memory. These equations are similar to 2. The Conv-LSTM model has the ability to capture both the spatial and temporal relationships in the data at the same time, resulting in more precise predictions. In our implementation, for the case of univariate time series, we utilise the 1D-convolutions in Conv-LSTM and 2D convolutional for multivariate time series forecasting.

3.2.4 Transformer Networks

Review of deep learning models for crypto price prediction: implementation and evaluation (7)

The Transformer model is an extension of the encoder-decoder LSTM architecture which has been widely used in machine translation problems [115].The encoder condenses the essential data of the input sequence into a vector of fixed length, which is subsequently transformed into an output by the decoder [105]. The design of the decoder offers a method for managing lengthy sequential data [116].

Analogously, we input the sequential data to a vector representation layer. Given the input sequence $X=\{x_{i}:i=1,\ldots,N\}\in\mathbb{R}^{N}$ , the $m$ -dimensional embedding layer yields a matrix $B\in\mathbb{R}^{N\times m}$ through a dense network.

We need to incorporate temporal encoding with the vectorised input to encapsulate the temporal structure of the time series. Hence, employing sine and cosine functions at distinct frequencies to represent temporal information, we define:

	$\displaystyle\text{TE}_{(i,2k)}$	$\displaystyle=\sin\left(i/10000^{2k/m}\right),$
	$\displaystyle\text{TE}_{(i,2k+1)}$	$\displaystyle=\cos\left(i/10000^{2k/m}\right),$

where $1\leq 2k\leq m$ . The temporal encoding, hence, is TE $\in\mathbb{R}^{N\times m}$ . The vector representations alongside the temporal encodings are then concatenated and provided to the encoder layers.

A concise overview of the complete framework of our Transformer model is delineated in Figure 7. The encoder depicted in Figure 7 consists of $M$ identically structured layers. Each layer is equipped with two sub-layers: a multihead self-attention mechanism and a fully connected feed-forward network. Both sub-layers incorporate residual connections and normalization to enhance their functionality. The decoder, also shown in Figure 7, mirrors the encoder’s structure with a notable distinction: it features an additional multi-head self-attention layer. Unlike the original decoder described in [117], this version omits the masked attention mechanism because it processes only observed historical data, which does not include future information.

The emergence of attention mechanisms marks a pivotal innovation in deep learning, focusing computational efforts to capture attention mechanism in cognition. Vaswani et al. [117] revolutionized this approach by introducing the Transformer architecture, predicated on the exclusive use of self-attention mechanisms. The self-attention mechanism, as defined, follows:

\text{Attention}(P,R,S)=\text{softmax}\left(\frac{PR^{\top}}{\sqrt{m}}\right)S,

(5)

where $P,R,S\in\mathbb{R}^{N\times m}$ correspond to the query, key, and value matrices derived from three separate linear transformations of the same input. The architecture of the self-attention mechanism is illustrated in 7.

The self-attention mechanism has transformed the strategy of focusing on vital local content within the data. Vaswani et al. [117] expanded this idea by proposing multi-head attention, whereby several self-attention processes, or ”heads,” are executed in parallel, each assessing different projected versions of the queries, keys, and values. The combined outcomes of these heads are then linearly transformed to obtain the final output as shown in Figure 7.

3.2.5 Model training with Adam Optimiser

We utilise the modified Adam (adaptive moment estimation) optimiser [113] which is an extension to the stochastic gradient descent [118, 119] and further extends adaptive gradient methods (AdaGrad [120], AdaDelta [121], and RMSProp [122]). Adam is an adaptive gradient-based optimisation algorithm that computes individual adaptive learning rates for different parameters from the history of the first and second moments (mean and variance) of the gradients. Let $g_{k}\circ g_{k}$ signify the element-wise square of $g_{k}$ .

Input:

1.
Step size, $\alpha$
2.
Exponential decay rates for the moment estimates, $\beta_{a},\beta_{b}\in[0,1)$
3.
Stochastic objective function with parameters, $f(\xi)$
4.
Initial parameter vector, $\xi_{0}$

Initialise:

	$\displaystyle m_{0}$	$\displaystyle\leftarrow 0\quad\text{(Initialize 1st moment vector)}$
	$\displaystyle v_{0}$	$\displaystyle\leftarrow 0\quad\text{(Initialize 2nd moment vector)}$
	$\displaystyle k$	$\displaystyle\leftarrow 0\quad\text{(Initialize timestep)}$

Algorithm:

	$\displaystyle\text{while }\xi_{k}\text{ not converged do}$
	$\displaystyle\quad k\leftarrow k+1$
	$\displaystyle\quad g_{k}\leftarrow\nabla_{\xi}f(\xi_{k-1})$
	$\displaystyle\quad m_{k}\leftarrow\beta_{a}\cdot m_{k-1}+(1-\beta_{a})\cdot g_%{k}$
	$\displaystyle\quad v_{k}\leftarrow\beta_{b}\cdot v_{k-1}+(1-\beta_{b})\cdot g_%{k}\circ g_{k}$
	$\displaystyle\quad\hat{m}_{k}\leftarrow\frac{m_{k}}{(1-\beta_{a}^{k})}$
	$\displaystyle\quad\hat{v}_{k}\leftarrow\frac{v_{k}}{(1-\beta_{b}^{k})}$
	$\displaystyle\quad\xi_{k}\leftarrow\xi_{k-1}-\alpha\cdot\frac{\hat{m}_{k}}{(%\sqrt{\hat{v}_{k}}+\epsilon^{\prime})}$
	end while

Output: Resulting parameters $\xi_{k}$

Adam optimisation aims to minimise the expected value of a differentiable function, such as a neural network model $f(\xi)$ with a set of parameters given $\xi$ representing the weights and biases. The algorithm updates the exponential moving averages of the gradient $(m_{k})$ and its square $(v_{k})$ , with hyperparameters $\beta_{a},\beta_{b}$ controlling their decay rates. Adjustments in the algorithm improve efficiency, employing an updated computation for parameter adjustments with $\alpha_{k}=\alpha\sqrt{1-\beta_{b}^{k}}/(1-\beta_{a}^{k})$ . We note that vector operations are performed element-wise.

The key to Adam’s update mechanism is the adaptive step size, influenced by the signal-to-noise ratio $\widehat{m}_{k}/\sqrt{\widehat{v}_{k}}$ , dictating the magnitude of parameter updates. This feature allows for effective scaling of steps in parameter space, contributing to the robustness and versatility of the algorithm in various optimisation contexts.

3.3 Data

We choose four different cryptocurrencies to evaluate the performance of the respective statistical and deep learning models. The cryptocurrencies include Bitcoin, Ethereum, Dogecoin and Litecoin. We focus on multi-step ahead stock price forecasting, where a step is defined by a day. Bitcoin is the first and most prominent cryptocurrency, which was launched in 2009 by Satoshi Nakamoto [8]. Ethereum was designed in 2013 by Vitalik Buterin and Gavin Wood [123]. Ethereum is not just a cryptocurrency but also a platform for building decentralized applications using smart contracts. After Bitcoin, Ethereum is the cryptocurrency with the second-largest market capitalisation. Dogecoin, another open-source cryptocurrency based on the popular ”doge” internet meme, grew in popularity and price in 2021 after billionaire Elon Musk publicly backed it. Litecoin created by Charlie Lee in 2011, Litecoin is based on Bitcoin’s protocol but differs in terms of the algorithm used. Litecoin uses the scrypt encryption, proposed by Colin Percival [124].

Due to the incompleteness of data sources, we combine data sources from two websites, including Yahoo Finance²²2https://finance.yahoo.com/lookup and Kaggle [125] with fundamental details summarised in Table 5. The datasets feature the historical price information of the four cryptocurrencies with begin and end date and the number of data points shown in Table 6. We forecast the closing price of each cryptocurrency using univariate and multivariate deep learning models. According to Wang et al. [22], in the multivariate model, it is feasible to incorporate the features of the cryptocurrency price such as the $open$ , $high$ , $low$ and $volume$ to enhance forecasting accuracy. Hence, we used these features in our multivariate models as shown in Table 6. We also add gold prices as an additional feature to the multivariate model, as noted by Huynh et al. [26] who found a strong correlation between cryptocurrencies and the gold market. We obtained the Gold price data from London bullion market (LBMA) during 31 December 2012 to 28 February 2022 collected from Factset³³3https://www.factset.com/.

Cryptocurrency	Period (Day/Month/Year)	Size	Mean: Close price (USD)	Variance: Close price
Bitcoin	29/04/2013-01/04/2024	3991	13692	$2.856\times 10^{8}$
Ethereum	08/08/2015-01/04/2024	3160	983.53	$1.274\times 10^{6}$
Dogecoin	16/12/2013-01/04/2024	3760	0.0406	$5.947\times 10^{-3}$
Litecoin	29/04/2013-01/04/2024	3991	60.208	$3.884\times 10^{3}$

Variable	Variable Description	Data type
SNo	the order of the data	Number
Name	Name of cryptocurrency	Letter
Symbol	Abbreviation of cryptocurrency	Letter
Date	Date of observation	Date
High	Highest price on given day	Number
Low	Lowest price on given day	Number
Open	Opening price on given day	Number
Close	Closing price on given day	Number
Volume	Volume of transactions on given day	Number

3.4 Data processing

We need to reconstruct the original time series data for multi-step-ahead prediction using deep learning models. The embedding theorem of Taken’s states that the reconstruction process can replicate significant characteristics of the initial time series [126]. Given an observed time series $x(t)$ , we can generate embedded phase space $Y(t)=[x(t),x(t-T),...,X(t-(D-1)T)]$ ; where $T$ is the time delay, $D$ is the embedding dimension with $t=0,1,2,...,N-D-1$ , and $N$ is the original length of the time series. Takens’ theorem demonstrates that if the original attractor had a dimension of $d$ , then an embedding dimension of $D=2d+1$ would be enough.In univariate method prediction, we divide a single time series data set up into several sections. The input characteristics for each segment are data from $N$ subsequent time points, and the output label(s) is the time point(s) that comes afterwards. Therefore, we can have single-step prediction or multi-step ahead prediction.In the multivariate strategy, as input vectors, we will utilise a window that holds multiple time series data at sequential multiple time points as shown for the case of Bitcoin price prediction in Figure 8. The input data consists of multiple time series, such as the high-price and close-price of Bitcoin, and the stock price of Gold to provide a multi-step prediction of Bitcoin close price.

Review of deep learning models for crypto price prediction: implementation and evaluation (8)

3.5 Framework

We present our framework in Figure 9 that highlights major components of the entire process. In Step 1, we extract data and pre-process data for analysis. Since the Gold price data only has its value on trading days (Monday to Friday), we use interpolation methods to fill in prices on non-trading days (Saturday, Sunday and Public Holidays). The interpolation we used is a linear method⁴⁴4https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.interpolate.html#pandas.DataFrame.interpolate which fills in missing values with the average of both sides. This is commonly used to handle missing values in time series data [127].

The data pre-processing in Step 2 features data-scaling to ensure the model stability, which limits the price within the range defined by min-max scalar⁵⁵5https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html. Furthermore, we separate the data into 2 sets for comparing the influence of COVID-19 and also determine the train-test data split, which is based on a given timeline (not shuffled). Our goal is to predict the closing price of the respective cryptocurrencies, therefore we have univariate data with close price and multivariate data with close price, gold price, high price, low price and opening price as shown in Table 1. We split the dataset into a 70:30 ratio. We use the data for the selected cryptocurrency from the opening to June 2021 as Dataset 1, where the last 30% of the data includes the period following the beginning of COVID-19 (March 2020 to June 2021), during which high volatility in crypto price was evident. We know that cryptocurrency is a financial asset with highly volatile prices [15], hence we explore which model is more effective in predicting the rising trend following the breakout of COVID-19. Dataset 2 features the COVID-19 period both in the train and test dataset (March 2020 to April 2024) to ensure that high volatility crypto price data is part of the training dataset. Table 7 presents the dates for the train and test datasets, for both experiments.

Crypto	Experiment	Split	Period (Day/Month/Year)
Bitcoin	Dataset 1	Train	29/04/2013-27/12/2018
		Test	28/12/2018-01/06/2021
	Dataset 2	Train	01/03/2020-09/01/2023
		Test	10/01/2023-01/04/2024
Ethereum	Dataset1	Train	08/08/2015-02/09/2019
		Test	03/09/2019-01/06/2021
	Dataset 2	training	01/03/2020-09/01/2023
		testing	10/01/2023-01/04/2024
Dogecoin	Dataset 1	Train	16/12/2013-06/03/2019
		Test	07/03/2019-01/06/2021
	Dataset 2	Train	01/03/2020-09/01/2023
		Test	10/01/2023-01/04/2024
Litecoin	Dataset 1	Train	29/04/2013-27/12/2018
		Test	28/12/2018-01/06/2021
	Dataset 2	Train	01/03/2020-09/01/2023
		Test	10/01/2023-01/04/2024

In Step 3, we select the optimal hyperparameters for each model using trial runs, knowledge from the same model runs in literature (e.g. [40]), and the default values in the library implementation (e.g. PyTorch[128]). We note that we use RMSE as the accuracy measure for all the models in our framework. Once the best parameters have been determined, we can then continue with our investigations that compare univariate and multivariate deep learning models.

In Step 4, we provide data analysis by first implementing a volatility analysis of the close price of selected cryptocurrencies, which can reveal fluctuations and patterns throughout the selected period. Historically, cryptocurrencies featured a wide price range, with high major fluctuation across the COVID-19 period [129, 130]. We use the volatility analysis to review the fluctuations, since they can lead to instability during the model training process, causing slower convergence and poor generalisation ability on the test dataset. In Step 4, taking into account our multivariate models, we provide feature correlation analysis to find out how the different features affect each other.

We next compare the respective models in Step 5 with Experiment 1 (pre-COVID-19 training data) and select the two best-performing models for the next step. We develop and compare the multivariate model and univariate deep learning model including LSTM, BD-LSTM, ED-LSTM, CNN, Conv-LSTM, and Transformer. models predict the close price of the selected cryptocurrencies. We use MLP and ARIMA models as baseline models for Bitcoin dataset.

In Step 6, we use Dataset 2 for Experiment 2 to predict the close price during COVID-19 using training data that features COVID-19 effect on the cryptocurrencies. We do this to determine whether the prediction accuracy has been improved and hence compare the results with Experiment 1. We also incorporate the shuffle data splitting strategy to enhance the efficacy of model performance. The initial 70% of the data is randomly rearranged using the shuffle.

Review of deep learning models for crypto price prediction: implementation and evaluation (9)

3.6 Technical details

In order to distinguish the model performance, we use RMSE as the criterion for the different prediction horizons. The smaller the RMSE values, the better the prediction accuracy:

RMSE=\sqrt{\frac{1}{N}\sum_{i=1}^{N}(y_{i}-\hat{y_{i}})^{2}}

(6)

where $y_{i}$ and $\hat{y_{i}}$ are the observed data and predicted data, respectively. $N$ is the length of observed data.

As indicated earlier, we use the Adam optimiser for all the deep learning models, where we use default values for the hyperparameters, i.e. $\alpha=0.001,\beta_{a}=0.9,\beta_{b}=0.999$ , and $\epsilon^{\prime}=1e-8$ .

In the case of the Transformer and ARIMA model, we reviewed the literature [39, 131, 70] to obtain the hyperparameters. Table 9 describes the details of model hyperparameters, including the number of input layers, output layers, hidden layers and other hyperparameters. We use the ReLu activation function in the respective deep learning models with a maximum training time of 200 epochs via the Adam optimiser [113].

We implemented specific experiments to determine the appropriate hyperparameters for each model. Based on related models in the literature [40], we used model architectures: CNN and LSTM variants feature one hidden layer with selected hidden units. We refer to previous research on MLP model for time series prediction [132] and evaluate performance for selected hidden neurons, as shown in Table 8. We use the first 70% of the Bitcoin close-price in dataset1 for training and the remaining for testing. We repeated model training with different initial parameters for each hyperparameter configuration 5 times and reported the average. Table 8 presents the performance (RMSE) of each model in the test dataset for the hyperparameters, with the best values in bold.

Model	Hidden	Train	Test
LSTM	20	0.0194	0.0696
	50	0.0176	0.0490
	100	0.0165	0.0370
BD-LSTM	20	0.0195	0.0239
	50	0.0177	0.0210
	100	0.0170	0.0204
ED-LSTM	20	0.0192	0.0930
	50	0.0169	0.0609
	100	0.0164	0.0373
Conv-LSTM	20	0.0124	0.0176
	50	0.0123	0.0181
	100	0.0132	0.0194
CNN	20	0.0126	0.0206
	50	0.0128	0.0209
	100	0.0135	0.0235
MLP	5	0.0137	0.0278
	10	0.0130	0.0268
	20	0.0122	0.0195

	Inputlayers	HiddenLayers	Outputlayers	Comments
MLP	(6,1)	3	(1,5)	Include three hidden layers.
ARIMA	-	-	-	Construct ARIMA(1,0,1) model.
LSTM	(6,1)	2	(1,5)	Include two LSTM layers.
BD-LSTM	(6,1)	2	(1,5)	Include Forward&Backward LSTM layer.
ED-LSTM	(6,1)	4	(1,5)	Two LSTM networks with a time distributed layer.
Conv-LSTM	(6,1)	3	(1,5)	Include Conv1D layer, LSTM network and dense layer.
CNN	(6,1)	4	(1,5)	Include Conv1D layer, pooling layer and two dense layers.
Transformer	(6,1)	2	(1,5)	Include a Multi-Head Attention mechanism and aposition-wise fully connected feed-forward network.

4 Results

In this section, we provide comprehensive information about the datasets and present research design with computational results.

4.1 Data analysis

The coronavirus disease 2019 (COVID-19) pandemic [133] originated 17th November 2019 in Wuhan, China, and extensively began spreading from March 2020 [134] worldwide. COVID-19 had a devastating effect on the world economy, and its impact included finance, supply chain, politics, and mental health [135], with further effects in the post-pandemic era [136, 137, 138]. Therefore, it is necessary to analyze the price trends of the four cryptocurrencies in our study.

We investigate the trends for the four selected cryptocurrencies over the given period covering COVID-19. We cover all the phases of COVID-19, including its initiation, spread, and decline. Figure 10 presents the close price of Bitcoin, Ethereum, Dogecoin and Litecoin across the selected period, with the shaded region (pink) indicating COVID-19. We can observe that the closing price of each cryptocurrency exhibited large fluctuation within the red area. Litecoin experienced significant volatility before the beginning of COVID-19, while the price fluctuations of the other three cryptocurrencies (Bitcoin, Dogecoin and Ethereum) before COVID-19 were not significant. This demonstrates that after COVID-19, the price of cryptocurrency is more volatile than before. We observe that Ethereum trend is highly correlated to Bitcoin before and during COVID-19. There is a significant price increase from 2020 to 2022, which was subsequently followed by a decrease and another increase in recovering the price, in the case of Bitcoin and Ethereum. Next, we present the monthly volatility plot in Figure 11, where we observe that Ethereum and Litecoin generally lie below 10% during COVID-19 and highlighted (pink). We also show the Bitcoin monthly volatility below 6% during the same time; however, Dogecoin presents a different trend during COVID-19. The monthly volatility of the Dogecoin reached above 20% in January and May, 2021. During other months, it remained consistently at a value of 15%. Our analysis reveals that the volatility patterns of 4 cryptocurrencies indicate a significant decrease in volatility in the subsequent month after the periods of high volatility. The monthly volatility during COVID-19 is generally similar to the monthly volatility, prior to the pandemic (2018 onwards). Although the monthly volatility does not change significantly, it fluctuates significantly when looking at the daily close price across the entire period.

Review of deep learning models for crypto price prediction: implementation and evaluation (10)

Review of deep learning models for crypto price prediction: implementation and evaluation (11)

Review of deep learning models for crypto price prediction: implementation and evaluation (14)

Review of deep learning models for crypto price prediction: implementation and evaluation (15)

Review of deep learning models for crypto price prediction: implementation and evaluation (18)

Review of deep learning models for crypto price prediction: implementation and evaluation (19)

Review of deep learning models for crypto price prediction: implementation and evaluation (20)

Since we will develop a multivariate model, we also need to provide analyses of how different features of the cryptocurrency (low, high, open, and close price) are correlated with the Gold price.Figure 12 shows the correlations between the features of the multivariate model in each cryptocurrency using Pearson correlation. We observe that close-price is highly correlated to the low-price, high-price and open price. We observe that there is a lower correlation between Gold and other features; however, we will use Gold in our multivariate model as data that is outside the crypto ecosystem, but linked to it. We also find that Gold price has the highest correlation with Bitcoin, followed by Ethereum and Litecoin, and the least with Dodgecoin. Figure 13 presents the Pearson correlation for the respective features including the close, high, low and opening price for a given cryptocurrency with Gold price and most correlated other cryptocurrency (using Figure 12), which is Ethereum in the case of Bitcoin, i.e. Figure 13 -Panel (a). We will use this for multivariate prediction strategy using data processing as shown in Figure 8.

4.2 Results: pre-COVID-19

We next implement the investigations outlined in Step 4 (Experiment 1) of Framework (Figure 9, where we compare the selected deep learning models and univariate and multivariate strategies using training dataset pre-COVID-19. Note that our test dataset includes the first phase of COVID-19 (Table 3).

We present the results for each prediction horizon (step) obtained from 30 independent experimental runs (mean RMSE and 95% confidence interval) that feature model training using different initial weights and biases. We note that robustness is the degree of confidence in a forecast, which is indicated by a low confidence interval. Moreover, scalability refers to the capacity to maintain constant performance as the prediction horizon expands. Our main focus is the performance (RMSE) on the test dataset, both in terms of the mean of 5 prediction horizons, and the individual prediction horizons. Therefore, in the rest of the discussion, we focus on the test dataset.

We first use Bitcoin data to evaluate conventional models (MLP and ARIMA) when compared to deep learning models (LSTM, ED-LSTM, BD-LSTM, CNN, Conv-LSTM, Transformer), for the univariate (Figure 14) and multivariate strategies (15). The results show that MLP and ARIMA perform worse than the deep learning models. MLP exhibits a lack of robustness, and ARIMA model struggles in test prediction accuracy when compared to the deep learning models. We note that ARIMA does the best on the train dataset due to over-training and struggles in generalisation ability. The deep learning model results are consistent with the finding by Chandra et al. [40] where the prediction accuracy of deep learning models is better than conventional machine learning models for multistep ahead time-series forecasting. The prediction performance of each model shows a trend where the best Multivariate strategy (ED-LSTM) provides consistent accuracy as the prediction horizon changes when compared to the Univariate strategy (BD-LSTM). In Figure 15, the Multivariate strategy shows that Conv-LSTM provides the lowest prediction accuracy, while ED-LSTM and BD-LSTM models provide the most accurate predictions. In Figure 14, contrary to the results of the Multivariate strategy, the most robust Univariate model for predicting Bitcoin is Conv-LSTM.

Review of deep learning models for crypto price prediction: implementation and evaluation (23)

Review of deep learning models for crypto price prediction: implementation and evaluation (24)

Review of deep learning models for crypto price prediction: implementation and evaluation (25)

Review of deep learning models for crypto price prediction: implementation and evaluation (26)

Figure 16 presents the results for Ethereum using the Univariate strategy, where we observe that LSTM provides the best test performance, followed by BD-LSTM. Figure 17 provides the results for the Multivariate strategy, where CNN provides the best performance which is followed by Conv-LSTM. Notably, the Transformer model provides the best performance. In comparison to the Univariate strategy, we notice that the Multivariate strategy provides a much better test accuracy, which is also more robust and scalable, i.e. higher prediction horizons maintain better accuracy. Furthermore, we note that the Conv-LSTM provides the worst performance in the Univariate case, but one of the best in the Multivariate strategy.

In the case of Dodgecoin, Figures 18 and 19 reveal that BD-LSTM exhibits the best accuracy, both for the Univariate and the Multivariate strategies, and also provides similar stability for higher prediction horizons. This could be due to the price and vitality trends in Figures 10 and 11 (Panels c), where we notice that Dodgecoin has a similar trend pre-COVID-19 and during the first phase of COVID-19 which makes Dataset 1 used for these experiments. Furthermore, we also note that in Figure 13 (Panel c), Dodgecoin is least correlated with the Gold stock price, which is the major factor making a difference in the multivariate model. We notice that CNN provides the worst accuracy formed by the Transformer model in both strategies.

Finally, we present the results for the Litcoin for both Univariate and Multivariate strategies. Figures 20 21 show all the results of the Univariate models, where we find that the Conv-LSTM and BD-LSTM show the best performance, whereas LSTM, ED-LSTM and Conv-LSTM provide the best performances for the Multivariate strategies. On the contrary, the CNN provides the worst performance for the Multivariate strategy, much higher in magnitude when compared to the rest of the models. We also notice that the Multivariate strategy provides better stability as the prediction horizon increases when compared to the Univariate strategy, and the Univariate strategy provides much better accuracy of the best models when compared to the Multivariate strategy.We summarise the results further in Table 10 which features the model prediction accuracy of the test dataset. We report the RMSE mean and 95% confidence interval for the four cryptocurrencies, and the best models for the different steps are highlighted in bold. The test mean provides the average of the five steps. It is clear that the Univariate models are better than the Multivariate models; however, we find that the accuracy of both strategies is close (test mean) for Bitcoin and Dogecoin.

Review of deep learning models for crypto price prediction: implementation and evaluation (27)

Review of deep learning models for crypto price prediction: implementation and evaluation (28)

Review of deep learning models for crypto price prediction: implementation and evaluation (29)

Review of deep learning models for crypto price prediction: implementation and evaluation (30)

Review of deep learning models for crypto price prediction: implementation and evaluation (31)

Review of deep learning models for crypto price prediction: implementation and evaluation (32)

Review of deep learning models for crypto price prediction: implementation and evaluation (33)

Review of deep learning models for crypto price prediction: implementation and evaluation (34)

Review of deep learning models for crypto price prediction: implementation and evaluation (35)

Review of deep learning models for crypto price prediction: implementation and evaluation (36)

Review of deep learning models for crypto price prediction: implementation and evaluation (37)

Review of deep learning models for crypto price prediction: implementation and evaluation (38)

Data	Strategy	Model	Step 1	Step 2	Step 3	Step 4	Step 5	Test Mean
BTC	Univariate	CNN	0.0380 $\pm$ 0.0015	0.0451 $\pm$ 0.0017	0.0476 $\pm$ 0.0013	0.0537 $\pm$ 0.0015	0.0616 $\pm$ 0.0021	0.0492 $\pm$ 0.0016
		LSTM	0.0223 $\pm$ 0.0013	0.0337 $\pm$ 0.0020	0.0425 $\pm$ 0.0025	0.0496 $\pm$ 0.0026	0.0515 $\pm$ 0.0022	0.0399 $\pm$ 0.0021
		ED-LSTM	0.0250 $\pm$ 0.0014	0.0363 $\pm$ 0.0029	0.0421 $\pm$ 0.0030	0.0448 $\pm$ 0.0028	0.0477 $\pm$ 0.0025	0.0392 $\pm$ 0.0025
		BD-LSTM	0.0196 $\pm$ 0.0008	0.0296 $\pm$ 0.0013	0.0333 $\pm$ 0.0014	0.0385 $\pm$ 0.0012	0.0424 $\pm$ 0.0015	0.0327 $\pm$ 0.0012
		Conv-LSTM	0.0244 $\pm$ 0.0012	0.0302 $\pm$ 0.0011	0.0381 $\pm$ 0.0015	0.0434 $\pm$ 0.0020	0.0468 $\pm$ 0.0023	0.0366 $\pm$ 0.0016
		Transformer	0.0360 $\pm$ 0.0041	0.0431 $\pm$ 0.0038	0.0500 $\pm$ 0.0036	0.0563 $\pm$ 0.0037	0.0617 $\pm$ 0.0039	0.0494 $\pm$ 0.0038
	Multivariate	CNN	0.0416 $\pm$ 0.0011	0.0477 $\pm$ 0.0016	0.0534 $\pm$ 0.0018	0.0575 $\pm$ 0.0019	0.0606 $\pm$ 0.0021	0.0522 $\pm$ 0.0017
		LSTM	0.0310 $\pm$ 0.0018	0.0395 $\pm$ 0.0029	0.0449 $\pm$ 0.0036	0.0458 $\pm$ 0.0031	0.0467 $\pm$ 0.0022	0.0416 $\pm$ 0.0027
		ED-LSTM	0.0290 $\pm$ 0.0022	0.0356 $\pm$ 0.0017	0.0384 $\pm$ 0.0015	0.0405 $\pm$ 0.0013	0.0430 $\pm$ 0.0014	0.0373 $\pm$ 0.0016
		BD-LSTM	0.0247 $\pm$ 0.0018	0.0336 $\pm$ 0.0022	0.0415 $\pm$ 0.0026	0.0462 $\pm$ 0.0026	0.0511 $\pm$ 0.0026	0.0394 $\pm$ 0.0024
		Conv-LSTM	0.0500 $\pm$ 0.0037	0.0466 $\pm$ 0.0019	0.0678 $\pm$ 0.0095	0.0886 $\pm$ 0.0144	0.1327 $\pm$ 0.0266	0.0771 $\pm$ 0.0112
		Transformer	0.0382 $\pm$ 0.0026	0.0418 $\pm$ 0.0024	0.0459 $\pm$ 0.0021	0.0501 $\pm$ 0.0020	0.0526 $\pm$ 0.0022	0.0457 $\pm$ 0.0023
ETH	Univariate	CNN	0.0384 $\pm$ 0.0012	0.0453 $\pm$ 0.0017	0.0478 $\pm$ 0.0026	0.0521 $\pm$ 0.0030	0.0586 $\pm$ 0.0035	0.0484 $\pm$ 0.0024
		LSTM	0.0280 $\pm$ 0.0013	0.0341 $\pm$ 0.0012	0.0372 $\pm$ 0.0011	0.0441 $\pm$ 0.0016	0.0470 $\pm$ 0.0017	0.0381 $\pm$ 0.0014
		ED-LSTM	0.0261 $\pm$ 0.0012	0.0373 $\pm$ 0.0019	0.0405 $\pm$ 0.0018	0.0456 $\pm$ 0.0013	0.0471 $\pm$ 0.0012	0.0393 $\pm$ 0.0015
		BD-LSTM	0.0265 $\pm$ 0.0012	0.0351 $\pm$ 0.0019	0.0401 $\pm$ 0.0019	0.0423 $\pm$ 0.0022	0.0498 $\pm$ 0.0025	0.0388 $\pm$ 0.0019
		Conv-LSTM	0.0327 $\pm$ 0.0020	0.0410 $\pm$ 0.0021	0.0497 $\pm$ 0.0030	0.0609 $\pm$ 0.0048	0.0751 $\pm$ 0.0063	0.0519 $\pm$ 0.0036
		Transformer	0.0337 $\pm$ 0.0035	0.0412 $\pm$ 0.0042	0.0449 $\pm$ 0.0039	0.0521 $\pm$ 0.0032	0.0535 $\pm$ 0.0036	0.0451 $\pm$ 0.0037
	Multivariate	CNN	0.1007 $\pm$ 0.0038	0.1385 $\pm$ 0.0261	0.1564 $\pm$ 0.0444	0.1147 $\pm$ 0.0034	0.1276 $\pm$ 0.0023	0.1276 $\pm$ 0.0160
		LSTM	0.1913 $\pm$ 0.0156	0.1868 $\pm$ 0.0190	0.1993 $\pm$ 0.0178	0.1887 $\pm$ 0.0180	0.1767 $\pm$ 0.0224	0.1886 $\pm$ 0.0186
		ED-LSTM	0.1815 $\pm$ 0.0362	0.1955 $\pm$ 0.0423	0.1862 $\pm$ 0.0441	0.1881 $\pm$ 0.0459	0.1913 $\pm$ 0.0476	0.1885 $\pm$ 0.0432
		BD-LSTM	0.1033 $\pm$ 0.0159	0.1788 $\pm$ 0.0354	0.1799 $\pm$ 0.0314	0.1741 $\pm$ 0.0285	0.1960 $\pm$ 0.0302	0.1664 $\pm$ 0.0283
		Conv-LSTM	0.0799 $\pm$ 0.0243	0.1011 $\pm$ 0.0247	0.1185 $\pm$ 0.0422	0.1786 $\pm$ 0.0526	0.1950 $\pm$ 0.0517	0.1346 $\pm$ 0.0391
		Transformer	0.2464 $\pm$ 0.0141	0.2485 $\pm$ 0.0136	0.2590 $\pm$ 0.0150	0.2508 $\pm$ 0.0147	0.2470 $\pm$ 0.0153	0.2503 $\pm$ 0.0145
DOGE	Univariate	CNN	0.1714 $\pm$ 0.0490	0.2620 $\pm$ 0.0831	0.2786 $\pm$ 0.1056	0.4862 $\pm$ 0.0956	0.5171 $\pm$ 0.0968	0.3431 $\pm$ 0.0860
		LSTM	0.1492 $\pm$ 0.0122	0.1523 $\pm$ 0.0134	0.1577 $\pm$ 0.0123	0.1637 $\pm$ 0.0127	0.1681 $\pm$ 0.0133	0.1582 $\pm$ 0.0128
		ED-LSTM	0.1386 $\pm$ 0.0136	0.1419 $\pm$ 0.0129	0.1401 $\pm$ 0.0127	0.1422 $\pm$ 0.0121	0.1428 $\pm$ 0.0118	0.1411 $\pm$ 0.0126
		BD-LSTM	0.0509 $\pm$ 0.0014	0.0562 $\pm$ 0.0017	0.0722 $\pm$ 0.0025	0.0648 $\pm$ 0.0021	0.0645 $\pm$ 0.0020	0.0617 $\pm$ 0.0019
		Conv-LSTM	0.0590 $\pm$ 0.0198	0.1554 $\pm$ 0.0913	0.1430 $\pm$ 0.0914	0.1456 $\pm$ 0.0893	0.1413 $\pm$ 0.0830	0.1289 $\pm$ 0.0750
		Transformer	0.2116 $\pm$ 0.0361	0.2228 $\pm$ 0.0517	0.2435 $\pm$ 0.0601	0.2219 $\pm$ 0.0503	0.2188 $\pm$ 0.0461	0.2237 $\pm$ 0.0489
	Multivariate	CNN	0.8122 $\pm$ 0.0212	0.6364 $\pm$ 0.0792	0.6725 $\pm$ 0.0700	0.6656 $\pm$ 0.0840	0.5972 $\pm$ 0.0871	0.6768 $\pm$ 0.0683
		LSTM	0.1706 $\pm$ 0.0087	0.1746 $\pm$ 0.0074	0.1828 $\pm$ 0.0083	0.1810 $\pm$ 0.0083	0.1825 $\pm$ 0.0095	0.1783 $\pm$ 0.0084
		ED-LSTM	0.1829 $\pm$ 0.0211	0.1823 $\pm$ 0.0205	0.1816 $\pm$ 0.0200	0.1821 $\pm$ 0.0196	0.1823 $\pm$ 0.0192	0.1822 $\pm$ 0.0201
		BD-LSTM	0.0616 $\pm$ 0.0065	0.0603 $\pm$ 0.0051	0.0619 $\pm$ 0.0039	0.0653 $\pm$ 0.0032	0.0720 $\pm$ 0.0042	0.0642 $\pm$ 0.0046
		Conv-LSTM	0.2103 $\pm$ 0.0975	0.1962 $\pm$ 0.0897	0.1912 $\pm$ 0.0859	0.2347 $\pm$ 0.1085	0.2160 $\pm$ 0.1012	0.2097 $\pm$ 0.0966
		Transformer	0.2472 $\pm$ 0.0200	0.2482 $\pm$ 0.0217	0.2501 $\pm$ 0.0211	0.2345 $\pm$ 0.0215	0.2332 $\pm$ 0.0203	0.2426 $\pm$ 0.0209
LTC	Univariate	CNN	0.0390 $\pm$ 0.0008	0.0470 $\pm$ 0.0015	0.0627 $\pm$ 0.0033	0.0948 $\pm$ 0.0128	0.1131 $\pm$ 0.0170	0.0713 $\pm$ 0.0071
		LSTM	0.0382 $\pm$ 0.0019	0.0479 $\pm$ 0.0024	0.0619 $\pm$ 0.0031	0.0768 $\pm$ 0.0038	0.0861 $\pm$ 0.0038	0.0622 $\pm$ 0.0030
		ED-LSTM	0.0369 $\pm$ 0.0028	0.0487 $\pm$ 0.0025	0.0631 $\pm$ 0.0036	0.0756 $\pm$ 0.0058	0.0838 $\pm$ 0.0066	0.0616 $\pm$ 0.0043
		BD-LSTM	0.0318 $\pm$ 0.0019	0.0401 $\pm$ 0.0019	0.0507 $\pm$ 0.0029	0.0588 $\pm$ 0.0039	0.0699 $\pm$ 0.0034	0.0503 $\pm$ 0.0028
		Conv-LSTM	0.0265 $\pm$ 0.0012	0.0362 $\pm$ 0.0017	0.0443 $\pm$ 0.0015	0.0513 $\pm$ 0.0019	0.0581 $\pm$ 0.0017	0.0433 $\pm$ 0.0016
		Transformer	0.0726 $\pm$ 0.0035	0.0712 $\pm$ 0.0041	0.0746 $\pm$ 0.0046	0.0818 $\pm$ 0.0038	0.0891 $\pm$ 0.0032	0.0779 $\pm$ 0.0038
	Multivariate	CNN	0.4648 $\pm$ 0.0468	0.4553 $\pm$ 0.0444	0.4810 $\pm$ 0.0445	0.5086 $\pm$ 0.0460	0.5167 $\pm$ 0.0492	0.4853 $\pm$ 0.0462
		LSTM	0.0625 $\pm$ 0.0052	0.0788 $\pm$ 0.0070	0.0921 $\pm$ 0.0058	0.0885 $\pm$ 0.0044	0.0919 $\pm$ 0.0044	0.0828 $\pm$ 0.0054
		ED-LSTM	0.0940 $\pm$ 0.0130	0.1121 $\pm$ 0.0125	0.1032 $\pm$ 0.0116	0.0833 $\pm$ 0.0036	0.0880 $\pm$ 0.0037	0.0961 $\pm$ 0.0089
		BD-LSTM	0.0859 $\pm$ 0.0111	0.1520 $\pm$ 0.0268	0.2513 $\pm$ 0.0441	0.2838 $\pm$ 0.0523	0.2760 $\pm$ 0.0501	0.2098 $\pm$ 0.0369
		Conv-LSTM	0.0542 $\pm$ 0.0054	0.0713 $\pm$ 0.0094	0.0952 $\pm$ 0.0126	0.1028 $\pm$ 0.0136	0.1123 $\pm$ 0.0127	0.0872 $\pm$ 0.0107
		Transformer	0.1532 $\pm$ 0.0131	0.1624 $\pm$ 0.0108	0.1724 $\pm$ 0.0083	0.1671 $\pm$ 0.0129	0.1650 $\pm$ 0.0128	0.1640 $\pm$ 0.0116

4.3 Results: Data featuring COVID-19

The previous section presents results given by the respective models using data before COVID-19. We found that the univariate strategy was better than the Multivariate strategy (Table 10), therefore we only used the Univariate strategy for Experiment 2 (during COVID-19) and presented the results in Table 11. We find that comparing the results of Experiment 1, the prediction accuracy of Bitcoin, Ethereum, and Dogecoin has improved to a certain extent. The prediction accuracy that reveals the greatest improvement is forecasting the close price of Dogecoin. After training using data from the COVID-19 period, the prediction performance of Dogecoin is almost close to Bitcoin and Ethereum. Nevertheless, the forecast precision for Litecoin decreased. Additionally, we find that the robustness of the model improved after training with data from the COVID-19 period. The models keep that the confidence intervals for all predicted horizons of Bitcoin and Ethereum are controlled within $\pm$ 0.0007. In the case of Dogecoin and Litecoin, the robustness of the models in 1-step ahead prediction has generally improved.

Data	Model	Step 1	Step 2	Step 3	Step 4	Step 5	Test Mean
BTC	BD-LSTM	0.0194 $\pm$ 0.0002	0.0258 $\pm$ 0.0003	0.0311 $\pm$ 0.0004	0.0367 $\pm$ 0.0003	0.0414 $\pm$ 0.0004	0.0309 $\pm$ 0.0003
	ED-LSTM	0.0199 $\pm$ 0.0001	0.0284 $\pm$ 0.0004	0.0339 $\pm$ 0.0006	0.0381 $\pm$ 0.0005	0.0418 $\pm$ 0.0003	0.0324 $\pm$ 0.0004
	LSTM	0.0283 $\pm$ 0.0004	0.0326 $\pm$ 0.0003	0.0369 $\pm$ 0.0003	0.0410 $\pm$ 0.0003	0.0447 $\pm$ 0.0002	0.0367 $\pm$ 0.0003
	CNN	0.0293 $\pm$ 0.0006	0.0342 $\pm$ 0.0006	0.0388 $\pm$ 0.0005	0.0431 $\pm$ 0.0004	0.0467 $\pm$ 0.0003	0.0384 $\pm$ 0.0005
	Conv-LSTM	0.0209 $\pm$ 0.0004	0.0263 $\pm$ 0.0004	0.0317 $\pm$ 0.0004	0.0372 $\pm$ 0.0003	0.0421 $\pm$ 0.0002	0.0316 $\pm$ 0.0004
	Transformer	0.0484 $\pm$ 0.0055	0.0514 $\pm$ 0.0051	0.0546 $\pm$ 0.0047	0.0585 $\pm$ 0.0047	0.0609 $\pm$ 0.0047	0.0548 $\pm$ 0.0050
ETH	BD-LSTM	0.0230 $\pm$ 0.0002	0.0304 $\pm$ 0.0004	0.0361 $\pm$ 0.0005	0.0426 $\pm$ 0.0004	0.0484 $\pm$ 0.0006	0.0361 $\pm$ 0.0004
	ED-LSTM	0.0230 $\pm$ 0.0001	0.0307 $\pm$ 0.0004	0.0362 $\pm$ 0.0004	0.0426 $\pm$ 0.0004	0.0481 $\pm$ 0.0006	0.0361 $\pm$ 0.0004
	LSTM	0.0238 $\pm$ 0.0005	0.0306 $\pm$ 0.0005	0.0361 $\pm$ 0.0005	0.0426 $\pm$ 0.0005	0.0480 $\pm$ 0.0005	0.0362 $\pm$ 0.0005
	CNN	0.0321 $\pm$ 0.0004	0.0397 $\pm$ 0.0008	0.0481 $\pm$ 0.0014	0.0536 $\pm$ 0.0016	0.0585 $\pm$ 0.0015	0.0464 $\pm$ 0.0012
	Conv-LSTM	0.0250 $\pm$ 0.0004	0.0332 $\pm$ 0.0007	0.0405 $\pm$ 0.0013	0.0481 $\pm$ 0.0020	0.0550 $\pm$ 0.0025	0.0404 $\pm$ 0.0014
	Transformer	0.0269 $\pm$ 0.0008	0.0359 $\pm$ 0.0010	0.0411 $\pm$ 0.0011	0.0486 $\pm$ 0.0017	0.0542 $\pm$ 0.0019	0.0413 $\pm$ 0.0013
DOGE	BD-LSTM	0.0291 $\pm$ 0.0001	0.0618 $\pm$ 0.0041	0.0673 $\pm$ 0.0046	0.0742 $\pm$ 0.0044	0.0795 $\pm$ 0.0041	0.0624 $\pm$ 0.0035
	ED-LSTM	0.0290 $\pm$ 0.0001	0.0626 $\pm$ 0.0030	0.0656 $\pm$ 0.0029	0.0708 $\pm$ 0.0026	0.0757 $\pm$ 0.0024	0.0607 $\pm$ 0.0022
	LSTM	0.0660 $\pm$ 0.0039	0.0683 $\pm$ 0.0042	0.0728 $\pm$ 0.0047	0.0792 $\pm$ 0.0048	0.0835 $\pm$ 0.0043	0.0740 $\pm$ 0.0044
	CNN	0.0646 $\pm$ 0.0041	0.0630 $\pm$ 0.0040	0.0655 $\pm$ 0.0039	0.0762 $\pm$ 0.0039	0.0806 $\pm$ 0.0031	0.0700 $\pm$ 0.0038
	Conv-LSTM	0.0538 $\pm$ 0.0021	0.0593 $\pm$ 0.0022	0.0612 $\pm$ 0.0020	0.0662 $\pm$ 0.0020	0.0726 $\pm$ 0.0019	0.0626 $\pm$ 0.0020
	Transformer	0.0536 $\pm$ 0.0071	0.0586 $\pm$ 0.0077	0.0639 $\pm$ 0.0073	0.0724 $\pm$ 0.0070	0.0789 $\pm$ 0.0064	0.0655 $\pm$ 0.0071
LTC	BD-LSTM	0.0577 $\pm$ 0.0007	0.0797 $\pm$ 0.0022	0.0968 $\pm$ 0.0018	0.1096 $\pm$ 0.0015	0.1205 $\pm$ 0.0016	0.0929 $\pm$ 0.0016
	ED-LSTM	0.0578 $\pm$ 0.0006	0.0797 $\pm$ 0.0012	0.0962 $\pm$ 0.0010	0.1096 $\pm$ 0.0009	0.1198 $\pm$ 0.0009	0.0926 $\pm$ 0.0009
	LSTM	0.0587 $\pm$ 0.0021	0.0804 $\pm$ 0.0017	0.0971 $\pm$ 0.0015	0.1100 $\pm$ 0.0014	0.1207 $\pm$ 0.0013	0.0934 $\pm$ 0.0016
	CNN	0.0823 $\pm$ 0.0011	0.1060 $\pm$ 0.0042	0.1163 $\pm$ 0.0025	0.1268 $\pm$ 0.0029	0.1403 $\pm$ 0.0049	0.1143 $\pm$ 0.0031
	Conv-LSTM	0.0809 $\pm$ 0.0073	0.1043 $\pm$ 0.0095	0.1201 $\pm$ 0.0087	0.1286 $\pm$ 0.0086	0.1383 $\pm$ 0.0072	0.1145 $\pm$ 0.0083
	Transformer	0.0890 $\pm$ 0.0056	0.1046 $\pm$ 0.0048	0.1163 $\pm$ 0.0042	0.1273 $\pm$ 0.0038	0.1354 $\pm$ 0.0033	0.1146 $\pm$ 0.0043

Data	Model	LSTM	ED-LSTM	BD-LSTM	CNN	Conv-LSTM	Transformer
BTC	Univariate	4	3	1	5	2	6
BTC	Multivariate	3	1	2	5	6	4
ETH	Univariate	1	3	2	5	6	4
ETH	Multivariate	5	4	3	1	2	6
DOGE	Univariate	4	3	1	6	2	5
DOGE	Multivariate	2	3	1	6	4	5
LTC	Univariate	4	3	2	5	1	6
LTC	Multivariate	1	3	5	6	2	4
Mean Rank		3	2.875	2.125	4.875	3.125	5

Data	LSTM	ED-LSTM	BD-LSTM	CNN	Conv-LSTM	Transformer
BTC	4	3	1	5	2	6
ETH	3	2	1	4	5	6
DOGE	6	4	1	5	2	3
LTC	3	1	2	4	5	6
Mean Rank	4	2.5	1.25	4.5	3.5	5.25

5 Discussion

In this section, we provide a discussion based on the results, taking into consideration the model architecture as well as the characteristics of the data. In summary, we evaluated the predictive performance of all models and presented the results through Figures 14 to 21. We also provide a ranking of the performance accuracy for each type of model in Tables 12 and 13.

We first review the results of the first experiment that investigated the model performance with COVID-19 data. Our results show that LSTM, BD-LSTM and ED-LSTM provide outstanding predictive performance across four different cryptocurrencies and two different approaches for selecting model variables. The CNN, Conv-LSTM, and Transformer models show good performance only under particular conditions. We also note that in all the cryptocurrencies, the Univariate model outperformed the Multivariate model, but in some cases (Bitcoin and Dogecoin) the accuracy was close when comparing both strategies. We found that the models with high forecast accuracy were mostly accompanied by narrower confidence intervals. On the contrary, the higher RMSE values usually resulted in lower robustness of the model. We found that models with better prediction accuracy (lower RMSE) provide more robust performance accuracy, given different model weight initializations in independent experimental runs. The accuracy of performance generally declines as the prediction horizons increase, which is natural for multistep ahead problems (Figure 14b). The prediction is derived from the current values, and the information gap expands as the prediction step increases. This is because our task is defined as a direct approach for forecasting multisteps, rather than an iterated prediction strategy. We observe the changes in the prediction horizon of each model and find that CNN and Conv-LSTM are significantly worse than other models. The forecast accuracy of these two models frequently declines more rapidly than that of other models, and occasionally even show volatility (Figure 18b), across Step 1 to 5.We found that the CNN-related models using convolution operation provided lower accuracy than other models in predicting cryptocurrency price. Later we will analyse the cause of this issue. Among the predictions for the four currencies in Experiment 1 (Table 10), Dogecoin has the worst prediction effect. The RMSE values of the model are significantly higher than those of the other three cryptocurrencies. We believe this is due to the particularity of the Dogecoin price, which leads to large prediction errors. The first 70% of the data fluctuates smoothly, while the last 30% of the data fluctuates violent.

In the second experiment, we evaluate the accuracy deep learning model predicting cryptocurrencies. We utilize the two models that predicted the best in previous experiments to do forecasting with the new dataset. The new dataset includes all close prices since COVID-19 to April 2024. It has been discovered that the close-price forecasts for Bitcoin, Dogecoin, and Ethereum have all shown enhancement. Also, as the prediction horizon rises, the prediction accuracy of the model deteriorates more slowly. We claim that there are two causes contributing to the decrease in the accuracy of forecasting litecoin. The first is that because our evaluation criterion for the model is the overall performance of the model, we did not choose Conv-LSTM, which had the best performance in predicting Litecoin in previous experiment. Maybe Conv-LSTM is more suitable for predicting Litecoin. The second reason is that Litecoin had a violent price ups and downs before COVID-19. Due to our design of the experiment, the data during this period was not included in the training set.

Next, we aim to investigate what might be contributing to the lower accuracy of multivariate models compared to univariate models in prediction. In our analysis (Fig or Table xxx, we notice that the price of cryptocurrencies is extremely unstable and is greatly influenced by several variables outside or inside of the market [10]. Simply inserting some additional factors will not only be ineffective in assisting the model to accurately forecast outcomes, but it may also mislead the model into acquiring knowledge of irrelevant data features.

Our analysis of volatility (Figure 10 and 11) shows the high degree of volatility exhibited by cryptocurrencies throughout the COVID-19 pandemic. Through a comparative analysis of the outcomes obtained from first and second experiment, we observed that the use of high volatility data as the training set provides better prediction accuracy for the model. The robustness and scalability of the model are also improved.

Next, we investigate the factors contributing to the advantages and disadvantages of each model. Conventional time series models and machine learning techniques are inadequate for addressing issues such as timing dependency and gradient explosion. As we used MLP and ARIMA models to predict Bitcoin, the prediction performance was not as good as the deep learning model. And according to [139], there is long-term memory in the cryptocurrency market. The LSTM network, a deep learning model initially designed to address long-term memory issues, distinguishes itself in this regard. The memory gate in the LSTM network can better capture information in time series with long-term dependencies. The prediction performance of CNN is worse than that of LSTM, which is what we expected. Because the convolutional layer in CNN is better at capturing local patterns and features, it has better prediction results for sequences where there is obvious spatial correlation between data points. According to past research, CNN seems to be more effective in handling image recognition problems. Next, we analyze the differences between the four models with LSTM layers (LSTM, ED-SLTM, BD-LSTM and Conv-LSTM). The ED-LSTM model has been created for language modeling problems, particularly for sequence to sequence modeling in language translation. In this model, an encoder LSTM is used to transform a source sequence into a fixed-length vector, while a decoder LSTM is employed to convert the vector representation back into a variable-length target sequence [105]. In our study, the encoder function maps an input time series to a vector of fixed length. Subsequently, the decoder LSTM function translates the vector representation to several prediction horizons. Despite the differences in the application, the fundamental objective of mapping inputs to outputs remains unchanged. As a result, ED-LSTM models have proven to be highly efficient for multi-step ahead prediction. BD-LSTMs utilize two LSTM models to capture both forward and backward information about the sequence at each time step [102]. While these models have been shown effective for language modeling, our findings indicate that they’re useful in tracking both present and future states for time series modeling. In our experiments, BD-LSTM and ED-LSTM provided stable and outstanding prediction performance. Although Conv-LSTM uses convolutional layers as input with LSTM memory cells in the hidden layer, it differs from the conventional LSTM models since the memory cells from different hidden layers update only in the time domain and are mutually independent [140]. Therefore, information at the top layer in time $t-1$ will be ignored by the bottom layer at time $t$ . In cryptocurrency price prediction, the time information is crucial. This also explains why the prediction accuracy of Conv-LSTM in our experiments is often low and unstable at high prediction horizons. We also employed the Transformer provided unsatisfactory results which may be attributed to the limited training data, as Transformer models are often better suited for handling large amounts of data.

6 Conclusions

In this study, we provide a rigorous evaluation of novel deep learning models for cryptocurrency price forecasting. We compared prominent deep learning models using univariate and multivariate strategies. The results show that the Bidirecional-LSTM provides the highest accuracy in predicting cryptocurrency prices. We also provided a comparison with baseline models such as multilayer perceptron and ARIMA and found that deep learning models generally outperform them. We also found that multivariate models provided less prediction efficiency than univariate models; however, it has scope for improvement given the availability of higher correlated time series data as features. In terms of the effect of COVID-19, we found that close-price volatility for cryptocurrency is quite apparent. Our experimental results show that utilising a training data set with high volatility enhances the precision of our predictions.

In future work, it would be worthwhile to improve the multivariate model. It is advisable to utilise more dependable factors to enhance forecasts, maybe employing techniques like causal inference to find these variables. We can also use this study framework to switch the goal into predictions of other financial indicators such as volatility of cryptocurrency. Further applications to other specific issues could also be viable, such as predicting energy use and extreme weather forecasting.

7 Code and Data

We provide open source code and data using GitHub repository ⁶⁶6https://github.com/sydney-machine-learning/deeplearning-crypto.

References

[1]S.Bose, G.Dong, A.Simpson, S.Bose, G.Dong, A.Simpson, The financial ecosystem, Springer, 2019.
[2]J.Frankel, B.Smit, F.Sturzenegger, Fiscal and monetary policy in a commodity-based economy 1, Economics of transition 16(4) (2008) 679–713.
[3]F.A. Hayek, Denationalisation of money: the argument refined: an analysis of the theory and practice of concurrent currencies, Vol.70, Institute of economic affairs, 1990.
[4]M.M. Gross, C.Siebenbrunner, Money creation in fiat and digital currency systems, International Monetary Fund, 2019.
[5]J.H. Boyd, R.Levine, B.D. Smith, The impact of inflation on financial sector performance, Journal of monetary Economics 47(2) (2001) 221–248.
[6]U.Milkau, J.Bott, Digitalisation in payments: From interoperability to centralised models?, Journal of Payments Strategy & Systems 9(3) (2015) 321–340.
[7]D.Chaum, Blind signatures for untraceable payments, in: Advances in Cryptology: Proceedings of Crypto 82, Springer, 1983, pp. 199–203.
[8]S.Nakamoto, Bitcoin: A peer-to-peer electronic cash system (2008).
[9]A.Manimuthu, G.Rejikumar, D.Marwaha, etal., A literature review on Bitcoin: transformation of crypto currency into a global phenomenon, IEEE Engineering Management Review 47(1) (2019) 28–35.
[10]R.Farell, An analysis of the cryptocurrency industry, Wharton Research Scholars 130 (2015) 1–23.
[11]I.Eyal, Blockchain technology: Transforming libertarian cryptocurrency dreams to finance and banking realities, Computer 50(9) (2017) 38–49.
[12]H.Jang, J.Lee, An empirical study on modeling and prediction of bitcoin prices with bayesian neural networks based on blockchain information, IEEE access 6 (2017) 5427–5437.
[13]M.Saad, J.Choi, D.Nyang, J.Kim, A.Mohaisen, Toward characterizing blockchain-based cryptocurrencies for highly accurate predictions, IEEE Systems Journal 14(1) (2019) 321–332.
[14]S.Corbet, B.Lucey, L.Yarovaya, Datestamping the bitcoin and ethereum bubbles, Finance Research Letters 26 (2018) 81–88.
[15]J.Bhosale, S.Mavale, Volatility of select crypto-currencies: A comparison of bitcoin, ethereum and litecoin, Annu. Res. J. SCMS, Pune 6(1) (2018) 132–141.
[16]P.Katsiampa, An empirical investigation of volatility dynamics in the cryptocurrency market, Research in International Business and Finance 50 (2019) 322–335.
[17]H.Elendner, S.Trimborn, B.Ong, T.M. Lee, The cross-section of crypto-currencies as financial assets: An overview (2016).
[18]P.L. Seabe, C.R.B. Moutsinga, E.Pindza, Forecasting cryptocurrency prices using lstm, gru, and bi-directional lstm: A deep learning approach, Fractal and Fractional 7(2) (2023) 203.
[19]N.Kyriazis, S.Papadamou, S.Corbet, A systematic review of the bubble dynamics of cryptocurrency prices, Research in International Business and Finance 54 (2020) 101254.
[20]M.A. Ammer, T.H. Aldhyani, Deep learning algorithm to predict cryptocurrency fluctuation prices: Increasing investment awareness, Electronics 11(15) (2022) 2349.
[21]K.Murray, A.Rossi, D.Carraro, A.Visentin, On forecasting cryptocurrency prices: A comparison of machine learning, deep learning, and ensembles, Forecasting 5(1) (2023) 196–209.
[22]Y.Wang, G.Andreeva, B.Martin-Barragan, Machine learning approaches to forecasting cryptocurrency volatility: Considering internal and external determinants, International Review of Financial Analysis 90 (2023) 102914.
[23]N.A. Kyriazis, A survey on empirical findings about spillovers in cryptocurrency markets, Journal of Risk and Financial Management 12(4) (2019) 170.
[24]J.H. Stock, M.W. Watson, Vector autoregressions, Journal of Economic perspectives 15(4) (2001) 101–115.
[25]J.-C. Duan, The garch option pricing model, Mathematical finance 5(1) (1995) 13–32.
[26]T.L.D. Huynh, M.A. Nasir, X.V. Vo, T.T. Nguyen, “small things matter most”: The spillover effects in the cryptocurrency market and gold as a silver bullet, The North American Journal of Economics and Finance 54 (2020) 101277.
[27]T.Baltrušaitis, C.Ahuja, L.-P. Morency, Multimodal machine learning: A survey and taxonomy, IEEE transactions on pattern analysis and machine intelligence 41(2) (2018) 423–443.
[28]S.Wang, J.Cao, S.Y. Philip, Deep learning for spatio-temporal data mining: A survey, IEEE transactions on knowledge and data engineering 34(8) (2020) 3681–3700.
[29]B.Lim, S.Zohren, Time-series forecasting with deep learning: a survey, Philosophical Transactions of the Royal Society A 379(2194) (2021) 20200209.
[30]V.Jacques-Dumas, F.Ragone, P.Borgnat, P.Abry, F.Bouchet, Deep learning-based extreme heatwave forecast, Frontiers in Climate 4 (2022).
[31]S.Mahjoub, L.Chrifi-Alaoui, B.Marhic, L.Delahoche, Predicting energy consumption using lstm, multi-layer gru and drop-gru neural networks, Sensors 22(11) (2022) 4062.
[32]R.Chandra, Y.He, Bayesian neural networks for stock price forecasting before and during covid-19 pandemic, Plos one 16(7) (2021) e0253217.
[33]I.E. Livieris, E.Pintelas, S.Stavroyiannis, P.Pintelas, Ensemble deep learning models for forecasting cryptocurrency time-series, Algorithms 13(5) (2020) 121.
[34]F.Ferdiansyah, S.H. Othman, R.Z. R.M. Radzi, D.Stiawan, Y.Sazaki, U.Ependi, A lstm-method for bitcoin price prediction: A case study yahoo finance stock market, in: 2019 international conference on electrical engineering and computer science (ICECOS), IEEE, 2019, pp. 206–210.
[35]C.-H. Wu, C.-C. Lu, Y.-F. Ma, R.-S. Lu, A new forecasting framework for bitcoin price with lstm, in: 2018 IEEE international conference on data mining workshops (ICDMW), IEEE, 2018, pp. 168–175.
[36]S.Hochreiter, J.Schmidhuber, Long short-term memory, Neural computation 9(8) (1997) 1735–1780.
[37]Y.Yu, X.Si, C.Hu, J.Zhang, A review of recurrent neural networks: LSTM cells and network architectures, Neural computation 31(7) (2019) 1235–1270.
[38]Z.Jiang, J.Liang, Cryptocurrency portfolio management with deep reinforcement learning, in: 2017 Intelligent systems conference (IntelliSys), IEEE, 2017, pp. 905–913.
[39]S.Sridhar, S.Sanagavarapu, Multi-head self-attention transformer for dogecoin price prediction, in: 2021 14th International Conference on Human System Interaction (HSI), IEEE, 2021, pp. 1–6.
[40]R.Chandra, S.Goyal, R.Gupta, Evaluation of deep learning models for multi-step ahead time series prediction, Ieee Access 9 (2021) 83105–83123.
[41]R.S. Tsay, Analysis of financial time series, John wiley & sons, 2005.
[42]T.Fischer, C.Krauss, Deep learning with long short-term memory networks for financial market predictions, European journal of operational research 270(2) (2018) 654–669.
[43]O.B. Sezer, M.U. Gudelek, A.M. Ozbayoglu, Financial time series forecasting with deep learning: A systematic literature review: 2005–2019, Applied soft computing 90 (2020) 106181.
[44]V.Plakandaras, T.Papadimitriou, P.Gogas, K.Diamantaras, Market sentiment and exchange rate directional forecasting, Algorithmic Finance 4(1-2) (2015) 69–79.
[45]M.Nabipour, P.Nayyeri, H.Jabani, S.Shahab, A.Mosavi, Predicting stock market trends using machine learning and deep learning algorithms via continuous and binary data; a comparative analysis, Ieee Access 8 (2020) 150199–150212.
[46]A.Kong, H.Zhu, R.Azencott, Predicting intraday jumps in stock prices using liquidity measures and technical indicators, Journal of Forecasting 40(3) (2021) 416–438.
[47]J.L. Elman, Finding structure in time, Cognitive science 14(2) (1990) 179–211.
[48]S.Mehtab, J.Sen, A.Dutta, Stock price prediction using machine learning and lstm-based deep learning models, in: Machine Learning and Metaheuristics Algorithms, and Applications: Second Symposium, SoMMA 2020, Chennai, India, October 14–17, 2020, Revised Selected Papers 2, Springer, 2021, pp. 88–106.
[49]H.Rezaei, H.Faaljou, G.Mansourfar, Stock price prediction using deep learning and frequency decomposition, Expert Systems with Applications 169 (2021) 114332.
[50]G.Rilling, P.Flandrin, P.Goncalves, etal., On empirical mode decomposition and its algorithms, in: IEEE-EURASIP workshop on nonlinear signal and image processing, Vol.3, Grado: IEEE, 2003, pp. 8–11.
[51]M.E. Torres, M.A. Colominas, G.Schlotthauer, P.Flandrin, A complete ensemble empirical mode decomposition with adaptive noise, in: 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, 2011, pp. 4144–4147.
[52]N.Jing, Z.Wu, H.Wang, A hybrid model integrating deep learning with investor sentiment analysis for stock price prediction, Expert Systems with Applications 178 (2021) 115019.
[53]S.Mehtab, J.Sen, Stock price prediction using machine learning and deep learning algorithms and models, Machine Learning in the Analysis and Forecasting of Financial Time Series (2022) 235–303.
[54]Y.Li, Y.Pan, A novel ensemble deep learning model for stock prediction based on stock prices and news, International Journal of Data Science and Analytics 13(2) (2022) 139–149.
[55]A.Kanwal, M.F. Lau, S.P. Ng, K.Y. Sim, S.Chandrasekaran, Bicudnnlstm-1dcnn—a hybrid deep learning-based predictive model for stock price prediction, Expert Systems with Applications 202 (2022) 117123.
[56]T.Swathi, N.Kasiviswanath, A.A. Rao, An optimal deep learning-based lstm for stock price prediction using twitter sentiment analysis, Applied Intelligence 52(12) (2022) 13675–13688.
[57]H.BenAmeur, S.Boubaker, Z.Ftiti, W.Louhichi, K.Tissaoui, Forecasting commodity prices: empirical evidence using deep learning tools, Annals of Operations Research (2023) 1–19.
[58]P.Baser, J.R. Saini, N.Baser, Gold commodity price prediction using tree-based prediction models, International Journal of Intelligent Systems and Applications in Engineering 11(1s) (2023) 90–96.
[59]S.Deepa, A.Alli, S.Gokila, etal., Machine learning regression model for material synthesis prices prediction in agriculture, Materials Today: Proceedings 81 (2023) 989–993.
[60]Y.Zhao, G.Yang, Deep learning-based integrated framework for stock price movement prediction, Applied Soft Computing 133 (2023) 109921.
[61]J.Almeida, S.Tata, A.Moser, V.Smit, Bitcoin prediciton using ann, Neural networks 7 (2015) 1–12.
[62]D.C. Mallqui, R.A. Fernandes, Predicting the direction, maximum, minimum and closing prices of daily bitcoin exchange rate using machine learning techniques, Applied Soft Computing 75 (2019) 596–606.
[63]S.G. Quek, G.Selvachandran, J.H. Tan, H.Y.A. Thiang, N.T. Tuan, etal., A new hybrid model of fuzzy time series and genetic algorithm based machine learning algorithm: a case study of forecasting prices of nine types of major cryptocurrencies, Big Data Research 28 (2022) 100315.
[64]A.Radityo, Q.Munajat, I.Budi, Prediction of bitcoin exchange rate to american dollar using artificial neural network methods, in: 2017 international conference on advanced computer science and information systems (ICACSIS), IEEE, 2017, pp. 433–438.
[65]A.Greaves, B.Au, Using the bitcoin transaction graph to predict the price of bitcoin, No data 8 (2015) 416–443.
[66]C.Cortes, V.Vapnik, Support-vector networks, Machine learning 20 (1995) 273–297.
[67]Y.Sovbetov, Factors influencing cryptocurrency prices: Evidence from bitcoin, ethereum, dash, litcoin, and monero, Journal of Economics and Financial Analysis 2(2) (2018) 1–27.
[68]T.Guo, A.Bifet, N.Antulov-Fantulin, Bitcoin volatility forecasting with a glimpse into buy and sell orders, in: 2018 IEEE international conference on data mining (ICDM), IEEE, 2018, pp. 989–994.
[69]C.G. Akcora, A.K. Dey, Y.R. Gel, M.Kantarcioglu, Forecasting bitcoin price with graph chainlets, in: Advances in Knowledge Discovery and Data Mining: 22nd Pacific-Asia Conference, PAKDD 2018, Melbourne, VIC, Australia, June 3-6, 2018, Proceedings, Part III 22, Springer, 2018, pp. 765–776.
[70]S.Roy, S.Nanjiba, A.Chakrabarty, Bitcoin price forecasting using time series analysis, in: 2018 21st International Conference of Computer and Information Technology (ICCIT), IEEE, 2018, pp. 1–5.
[71]V.Derbentsev, N.Datsenko, O.Stepanenko, V.Bezkorovainyi, Forecasting cryptocurrency prices time series using machine learning approach, in: SHS Web of Conferences, Vol.65, EDP Sciences, 2019, p. 02001.
[72]S.Aanandhi, S.Akhilaa, V.Vardarajan, M.Sathiyanarayanan, etal., Cryptocurrency price prediction using time series forecasting (arima), in: 2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), IEEE, 2021, pp. 598–602.
[73]N.Latif, J.D. Selvam, M.Kapse, V.Sharma, V.Mahajan, Comparative performance of lstm and arima for the short-term prediction of bitcoin prices, Australasian Accounting, Business and Finance Journal 17(1) (2023) 256–276.
[74]N.Maleki, A.Nikoubin, M.Rabbani, Y.Zeinali, Bitcoin price prediction based on other cryptocurrencies using machine learning and time series analysis, Scientia Iranica 30(1) (2023) 285–301.
[75]L.P. Kaelbling, M.L. Littman, A.W. Moore, Reinforcement learning: A survey, Journal of artificial intelligence research 4 (1996) 237–285.
[76]K.Lee, S.Ulkuatam, P.Beling, W.Scherer, Generating synthetic bitcoin transactions and predicting market price movement via inverse reinforcement learning and agent-based modeling, Journal of Artificial Societies and Social Simulation 21(3) (2018).
[77]B.Ly, D.Timaul, A.Lukanan, J.Lau, E.Steinmetz, Applying deep learning to better predict cryptocurrency trends, in: Midwest Instruction and Computing Symposium, 2018.
[78]G.Lucarelli, M.Borrotti, A deep reinforcement learning approach for automated cryptocurrency trading, in: Artificial Intelligence Applications and Innovations: 15th IFIP WG 12.5 International Conference, AIAI 2019, Hersonissos, Crete, Greece, May 24–26, 2019, Proceedings 15, Springer, 2019, pp. 247–258.
[79]S.Lahmiri, S.Bekiros, Cryptocurrency forecasting with deep learning chaotic neural networks, Chaos, Solitons & Fractals 118 (2019) 35–40.
[80]M.M. Patel, S.Tanwar, R.Gupta, N.Kumar, A deep learning-based cryptocurrency price prediction scheme for financial institutions, Journal of information security and applications 55 (2020) 102583.
[81]S.Marne, S.Churi, D.Correia, J.Gomes, Predicting price of cryptocurrency–a deep learning approach, NTASU-9 (3) (2020).
[82]S.Nasekin, C.Y.-H. Chen, Deep learning-based cryptocurrency sentiment construction, Digital Finance 2(1) (2020) 39–67.
[83]C.Betancourt, W.-H. Chen, Deep reinforcement learning for portfolio management of markets with a dynamic number of assets, Expert Systems with Applications 164 (2021) 114002.
[84]K.Arulkumaran, M.P. Deisenroth, M.Brundage, A.A. Bharath, Deep reinforcement learning: A brief survey, IEEE Signal Processing Magazine 34(6) (2017) 26–38.
[85]Z.Shahbazi, Y.-C. Byun, Improving the cryptocurrency price prediction performance based on reinforcement learning, IEEE Access 9 (2021) 162651–162659.
[86]V.D’Amato, S.Levantesi, G.Piscopo, Deep learning in predicting cryptocurrency volatility, Physica A: Statistical Mechanics and its Applications 596 (2022) 127158.
[87]M.Schnaubelt, Deep reinforcement learning for the optimal placement of cryptocurrency limit orders, European Journal of Operational Research 296(3) (2022) 993–1006.
[88]R.Parekh, N.P. Patel, N.Thakkar, R.Gupta, S.Tanwar, G.Sharma, I.E. Davidson, R.Sharma, Dl-guess: Deep learning and sentiment analysis-based cryptocurrency price prediction, IEEE Access 10 (2022) 35398–35409.
[89]G.Kim, D.-H. Shin, J.G. Choi, S.Lim, A deep learning-based cryptocurrency price prediction model that uses on-chain data, IEEE Access 10 (2022) 56232–56248.
[90]S.Goutte, H.-V. Le, F.Liu, H.-J. VonMettenheim, Deep learning and technical analysis in cryptocurrency market, Finance Research Letters 54 (2023) 103809.
[91]K.-C. Yen, H.-P. Cheng, Economic policy uncertainty and cryptocurrency volatility, Finance Research Letters 38 (2021) 101428.
[92]F.Woebbeking, Cryptocurrency volatility markets, Digital finance 3(3) (2021) 273–298.
[93]J.L. Cross, C.Hou, K.Trinh, Returns, volatility and the cryptocurrency bubble of 2017–18, Economic Modelling 104 (2021) 105643.
[94]Z.Ftiti, W.Louhichi, H.BenAmeur, Cryptocurrency volatility forecasting: What can we learn from the first wave of the covid-19 outbreak?, Annals of Operations Research 330(1) (2023) 665–690.
[95]L.Yin, J.Nie, L.Han, Understanding cryptocurrency volatility: The role of oil market shocks, International Review of Economics & Finance 72 (2021) 233–253.
[96]L.Catania, S.Grassi, F.Ravazzolo, Predicting the volatility of cryptocurrency time-series, Mathematical and Statistical Methods for Actuarial Sciences and Finance: MAF 2018 (2018) 203–207.
[97]L.Catania, S.Grassi, Forecasting cryptocurrency volatility, International Journal of Forecasting 38(3) (2022) 878–894.
[98]F.Ma, C.Liang, Y.Ma, M.I.M. Wahab, Cryptocurrency volatility forecasting: A markov regime-switching midas approach, Journal of Forecasting 39(8) (2020) 1277–1290.
[99]Y.Wei, Y.Wang, B.M. Lucey, S.A. Vigne, Cryptocurrency uncertainty and volatility forecasting of precious metal futures markets, Journal of Commodity Markets 29 (2023) 100305.
[100]G.E. Box, G.M. Jenkins, G.C. Reinsel, G.M. Ljung, Time series analysis: forecasting and control, John Wiley & Sons, 2015.
[101]S.Hochreiter, The vanishing gradient problem during learning recurrent neural nets and problem solutions, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 6(02) (1998) 107–116.
[102]A.Graves, J.Schmidhuber, Framewise phoneme classification with bidirectional lstm and other neural network architectures, Neural networks 18(5-6) (2005) 602–610.
[103]Y.Liu, C.Sun, L.Lin, X.Wang, Learning natural language inference using bidirectional lstm model and inner-attention, arXiv preprint arXiv:1605.09090 (2016).
[104]L.Chen, J.Tao, S.Ghaffarzadegan, Y.Qian, End-to-end neural network based automated speech scoring, in: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, 2018, pp. 6234–6238.
[105]I.Sutskever, O.Vinyals, Q.V. Le, Sequence to sequence learning with neural networks, Advances in neural information processing systems 27 (2014).
[106]K.Cho, B.VanMerriënboer, C.Gulcehre, D.Bahdanau, F.Bougares, H.Schwenk, Y.Bengio, Learning phrase representations using RNN encoder-decoder for statistical machine translation, arXiv preprint arXiv:1406.1078 (2014).
[107]H.Gunduz, Y.Yaslan, Z.Cataltepe, Intraday prediction of borsa istanbul using convolutional neural networks and feature correlations, Knowledge-Based Systems 137 (2017) 138–148.
[108]L.DiPersio, O.Honchar, etal., Artificial neural networks architectures for stock price prediction: Comparisons and applications, International journal of circuits, systems and signal processing 10 (2016) 403–413.
[109]E.Hoseinzade, S.Haratizadeh, Cnnpred: Cnn-based stock market prediction using a diverse set of variables, Expert Systems with Applications 129 (2019) 273–285.
[110]A.Siripurapu, Convolutional networks for stock trading, Stanford Univ Dep Comput Sci 1(2) (2014) 1–6.
[111]H.Jiang, E.Learned-Miller, Face detection with the faster r-cnn, in: 2017 12th IEEE international conference on automatic face & gesture recognition (FG 2017), IEEE, 2017, pp. 650–657.
[112]A.Garcia-Garcia, S.Orts-Escolano, S.Oprea, V.Villena-Martinez, J.Garcia-Rodriguez, A review on deep learning techniques applied to semantic segmentation, arXiv preprint arXiv:1704.06857 (2017).
[113]D.P. Kingma, J.Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980 (2014).
[114]X.Shi, Z.Chen, H.Wang, D.-Y. Yeung, W.-K. Wong, W.-c. Woo, Convolutional lstm network: A machine learning approach for precipitation nowcasting, Advances in neural information processing systems 28 (2015).
[115]K.Cho, B.VanMerriënboer, D.Bahdanau, Y.Bengio, On the properties of neural machine translation: Encoder-decoder approaches, arXiv preprint arXiv:1409.1259 (2014).
[116]D.Bahdanau, K.Cho, Y.Bengio, Neural machine translation by jointly learning to align and translate, arXiv preprint arXiv:1409.0473 (2014).
[117]A.Vaswani, N.Shazeer, N.Parmar, J.Uszkoreit, L.Jones, A.N. Gomez, Ł.Kaiser, I.Polosukhin, Attention is all you need, Advances in neural information processing systems 30 (2017).
[118]S.-i. Amari, Backpropagation and stochastic gradient descent method, Neurocomputing 5(4-5) (1993) 185–196.
[119]S.Ruder, An overview of gradient descent optimization algorithms, arXiv preprint arXiv:1609.04747 (2016).
[120]J.Duchi, E.Hazan, Y.Singer, Adaptive subgradient methods for online learning and stochastic optimization., Journal of machine learning research 12(7) (2011).
[121]M.D. Zeiler, Adadelta: an adaptive learning rate method, arXiv preprint arXiv:1212.5701 (2012).
[122]G.Hinton, N.Srivastava, K.Swersky, Neural networks for machine learning lecture 6a overview of mini-batch gradient descent, Toronto University (2012).
[123]V.Buterin, etal., A next-generation smart contract and decentralized application platform, white paper 3(37) (2014) 2–1.
[124]C.Percival, S.Josefsson, The scrypt password-based key derivation function, Tech. rep. (2016).
[125]Cryptocurrency historical prices, last accessed 13 Feburary 2024 (2021).
URL https://www.kaggle.com/datasets/sudalairajkumar/cryptocurrencypricehistory/data
[126]F.Takens, Detecting strange attractors in turbulence, in: Dynamical Systems and Turbulence, Warwick 1980: proceedings of a symposium held at the University of Warwick 1979/80, Springer, 2006, pp. 366–381.
[127]A.Gnauck, Interpolation and approximation of water quality time series and process identification, Analytical and bioanalytical chemistry 380 (2004) 484–492.
[128]A.Paszke, S.Gross, F.Massa, A.Lerer, J.Bradbury, G.Chanan, T.Killeen, Z.Lin, N.Gimelshein, L.Antiga, etal., Pytorch: An imperative style, high-performance deep learning library, Advances in neural information processing systems 32 (2019).
[129]P.E. Mandaci, E.C. Cagli, Herding intensity and volatility in cryptocurrency markets during the COVID-19, Finance Research Letters 46 (2022) 102382.
[130]M.A. Naeem, E.Bouri, Z.Peng, S.J.H. Shahzad, X.V. Vo, Asymmetric efficiency of cryptocurrencies during COVID19, Physica A: Statistical Mechanics and its Applications 565 (2021) 125562.
[131]A.Tanwar, V.Kumar, Prediction of cryptocurrency prices using transformers and long short term neural networks, in: 2022 International Conference on Intelligent Controller and Computing for Smart Power (ICICCSP), IEEE, 2022, pp. 1–4.
[132]D.Wang, W.-Z. Lu, Forecasting of ozone level in time series using mlp model with a novel hybrid training algorithm, Atmospheric Environment 40(5) (2006) 913–924.
[133]D.Cucinotta, M.Vanelli, Who declares covid-19 a pandemic, Acta bio medica: Atenei parmensis 91(1) (2020) 157.
[134]K.G. Andersen, A.Rambaut, W.I. Lipkin, E.C. Holmes, R.F. Garry, The proximal origin of sars-cov-2, Nature medicine 26(4) (2020) 450–452.
[135]A.Brodeur, D.Gray, A.Islam, S.Bhuiyan, A literature review of the economics of covid-19, Journal of economic surveys 35(4) (2021) 1007–1044.
[136]M.Leach, H.MacGregor, I.Scoones, A.Wilkinson, Post-pandemic transformations: How and why COVID-19 requires us to rethink development, World development 138 (2021) 105233.
[137]G.Miao, Z.Chen, H.Cao, W.Wu, X.Chu, H.Liu, L.Zhang, H.Zhu, H.Cai, X.Lu, etal., From immunogen to COVID-19 vaccines: Prospects for the post-pandemic era, Biomedicine & Pharmacotherapy 158 (2023) 114208.
[138]D.Łaskawiec, M.Grajek, P.Szlacheta, I.Korzonek-Szlacheta, Post-pandemic stress disorder as an effect of the epidemiological situation related to the COVID-19 pandemic, in: Healthcare, Vol.10, 2022, p. 975.
[139]Y.Jiang, H.Nie, W.Ruan, Time-varying long-term memory in bitcoin market, Finance Research Letters 25 (2018) 280–284.
[140]Y.Wang, M.Long, J.Wang, Z.Gao, P.S. Yu, Predrnn: Recurrent neural networks for predictive learning using spatiotemporal lstms, Advances in neural information processing systems 30 (2017).