Transformers versus LSTMs for electronic trading
Paul Bilokon, Yitao Qiu

TL;DR
This study compares Transformer and LSTM models for financial time series prediction, finding that Transformers have limited advantages over LSTMs, which perform better on difference-based predictions.
Contribution
The paper introduces a new Transformer architecture for financial prediction and compares it with an enhanced LSTM model called DLSTM across multiple tasks.
Findings
Transformers show limited advantage in absolute price prediction.
LSTM-based models outperform in difference sequence prediction.
New architectures improve performance on financial time series tasks.
Abstract
With the rapid development of artificial intelligence, long short term memory (LSTM), one kind of recurrent neural network (RNN), has been widely applied in time series prediction. Like RNN, Transformer is designed to handle the sequential data. As Transformer achieved great success in Natural Language Processing (NLP), researchers got interested in Transformer's performance on time series prediction, and plenty of Transformer-based solutions on long time series forecasting have come out recently. However, when it comes to financial time series prediction, LSTM is still a dominant architecture. Therefore, the question this study wants to answer is: whether the Transformer-based model can be applied in financial time series prediction and beat LSTM. To answer this question, various LSTM-based and Transformer-based models are compared on multiple financial prediction tasks based on…
Peer Reviews
Decision·Submitted to ICLR 2025
1. The paper addresses a relevant and significant question by comparing LSTM and Transformer models in financial time series forecasting. 2. The experimental setup is extensive and provides substantial data.
1. The paper lacks code and detailed implementation information for both the Transformer and LSTM models, which limits reproducibility. 2. The novelty of the proposed approach is limited. While the authors introduce a DLSTM model to improve performance, the idea of decomposition was previously explored in models like DLinear [1], diminishing the originality of the contribution. Beyond the comparative analysis, additional innovation is also limited. 3. The decomposition strategy appears to be app
Originality: The study offers a novel perspective on the application of LSTM-based and Transformer-based models in financial time series forecasting, specifically in the context of electronic trading using high-frequency LOB data. The authors introduce a new LSTM-based model, DLSTM, which creatively combines LSTM with a time series decomposition approach inspired by the Autoformer architecture. This innovative integration of existing ideas allows DLSTM to outperform other models in the mid-price
While the paper presents valuable insights and contributions, there are a few areas that could be improved or require further clarification: Limited dataset diversity: The experiments in this study are conducted using LOB data from a single cryptocurrency pair (BTC-USDT or ETH-USDT) on one exchange (Binance). To demonstrate the generalizability of the proposed DLSTM model and the comparative analysis between LSTM-based and Transformer-based models, it would be beneficial to include a wider rang
Relevant Application: The use of LSTM and Transformer models for financial predictions on LOB data is timely and relevant given the growing interest in high-frequency trading and predictive models in finance. Comparative Scope: The study covers multiple models and tasks, providing a broad comparison between LSTM- and Transformer-based architectures on real-world financial data.
Unconvincing Novelty: The paper lacks substantial novelty. The DLSTM model is essentially a combination of existing methods, such as time series decomposition and LSTM layers, without a clear innovation. Similarly, the Transformer modifications are incremental and do not provide a compelling improvement. As a result, the contributions seem incremental and insufficiently distinct from existing work in financial time series forecasting. Interpretability Issues: The added complexity of Transformer
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStock Market Forecasting Methods · Data Stream Mining Techniques · Neural Networks and Applications
MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Dense Connections · Position-Wise Feed-Forward Layer · Sigmoid Activation · Absolute Position Encodings · Tanh Activation · Residual Connection · Adam
