From LSTM to GPT-2: Recurrent and Transformer-Based Deep Learning Architectures for Multivariate High-Liquidity Cryptocurrency Price Forecasting

Dinçer, Erçin; KİLİMCİ, ZEYNEP

doi:10.3390/sym18010032

From LSTM to GPT-2: Recurrent and Transformer-Based Deep Learning Architectures for Multivariate High-Liquidity Cryptocurrency Price Forecasting

Dinçer E., KİLİMCİ Z. H.

Symmetry, cilt.18, sa.1, 2026 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 18 Sayı: 1
Basım Tarihi: 2026
Doi Numarası: 10.3390/sym18010032
Dergi Adı: Symmetry
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, INSPEC, zbMATH
Anahtar Kelimeler: Autoformer, cryptocurrency forecasting, deep learning, GPT-2, Informer, LSTM, multivariate time series, technical indicators, Temporal Fusion Transformer (TFT), transformer models
Kocaeli Üniversitesi Adresli: Evet

Özet

This study introduces a unified and methodologically symmetric comparative framework for multivariate cryptocurrency forecasting, addressing long-standing inconsistencies in prior research where model families, feature sets, and preprocessing pipelines differ across studies. Under an identical and rigorously controlled experimental setup, we benchmark six deep learning architectures—LSTM, GPT-2, Informer, Autoformer, Temporal Fusion Transformer (TFT), and a Vanilla Transformer—together with four widely used econometric models (ARIMA, VAR, GARCH, and a Random Walk baseline). All models are evaluated using a shared multivariate feature space composed of more than forty technical indicators, identical normalization procedures, harmonized sliding-window formations, and aligned temporal splits across five high-liquidity assets (BTC, ETH, XRP, XLM, and SOL). The experimental results show that transformer-based architectures consistently outperform both the recurrent baseline and classical econometric models across all assets. This superiority arises from the ability of attention mechanisms to capture long-range temporal dependencies and adaptively weight informative time steps, whereas recurrent models suffer from vanishing-gradient limitations and restricted effective memory. The best-performing deep learning models achieve MAPE values of 0.0289 (BTC, GPT-2), 0.0198 (ETH, Autoformer), 0.0418 (XRP, Informer), 0.0469 (XLM, Informer), and 0.0578 (SOL, TFT), substantially improving upon the performance of both LSTM and all econometric baselines. These findings highlight the effectiveness of attention-based architectures in modeling volatility-driven nonlinear dynamics and establish a reproducible, symmetry-preserving benchmark for future research in deep-learning-based financial forecasting.