Fusing Memory and Attention: A study on LSTM, Transformer and Hybrid Architectures for Symbolic Music Generation
Soudeep Ghoshal, Sandipan Chakraborty, Pradipto Chowdhury, Himanshu Buckchash

TL;DR
This study systematically compares LSTM and Transformer models for symbolic music generation, revealing their strengths and weaknesses in modeling local and global musical structures, and proposes a hybrid architecture that outperforms individual models.
Contribution
It provides a detailed analysis of LSTM and Transformer properties in SMG and introduces a hybrid model that combines their strengths for improved music generation.
Findings
LSTMs excel at local pattern modeling but lack long-range dependency capture.
Transformers effectively model global structure but produce irregular phrasing.
The hybrid model outperforms baselines in local and global coherence.
Abstract
Machine learning techniques, such as Transformers and Long Short-Term Memory (LSTM) networks, play a crucial role in Symbolic Music Generation (SMG). Existing literature indicates a difference between LSTMs and Transformers regarding their ability to model local melodic continuity versus maintaining global structural coherence. However, their specific properties within the context of SMG have not been systematically studied. This paper addresses this gap by providing a fine-grained comparative analysis of LSTMs versus Transformers for SMG, examining local and global properties in detail using 17 musical quality metrics on the Deutschl dataset. We find that LSTM networks excel at capturing local patterns but fail to preserve long-range dependencies, while Transformers model global structure effectively but tend to produce irregular phrasing. Based on this analysis and leveraging their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic Technology and Sound Studies · Music and Audio Processing · Neuroscience and Music Perception
