xLSTM: Extended Long Short-Term Memory

Maximilian Beck; Korbinian P\"oppel; Markus Spanring; Andreas Auer,; Oleksandra Prudnikova; Michael Kopp; G\"unter Klambauer; Johannes; Brandstetter; Sepp Hochreiter

arXiv:2405.04517·cs.LG·December 9, 2024·87 cites

xLSTM: Extended Long Short-Term Memory

Maximilian Beck, Korbinian P\"oppel, Markus Spanring, Andreas Auer,, Oleksandra Prudnikova, Michael Kopp, G\"unter Klambauer, Johannes, Brandstetter, Sepp Hochreiter

PDF

Open Access 5 Repos 6 Models 3 Videos

TL;DR

This paper introduces xLSTM, an extended LSTM architecture with exponential gating and modified memory structures, enabling scalable language modeling that rivals Transformers and State Space Models in performance.

Contribution

The paper presents novel extensions to LSTM, including exponential gating and new memory structures, to improve scalability and performance in large-scale language modeling.

Findings

01

xLSTM outperforms traditional LSTMs in large-scale tasks.

02

Exponential gating enhances stability and capacity.

03

Modified memory structures improve scalability.

Abstract

In the 1990s, the constant error carousel and gating were introduced as the central ideas of the Long Short-Term Memory (LSTM). Since then, LSTMs have stood the test of time and contributed to numerous deep learning success stories, in particular they constituted the first Large Language Models (LLMs). However, the advent of the Transformer technology with parallelizable self-attention at its core marked the dawn of a new era, outpacing LSTMs at scale. We now raise a simple question: How far do we get in language modeling when scaling LSTMs to billions of parameters, leveraging the latest techniques from modern LLMs, but mitigating known limitations of LSTMs? Firstly, we introduce exponential gating with appropriate normalization and stabilization techniques. Secondly, we modify the LSTM memory structure, obtaining: (i) sLSTM with a scalar memory, a scalar update, and new memory mixing,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

xLSTM: Extended Long Short-Term Memory· youtube

LSTM: The Comeback Story? [Prof. Sepp Hochreiter]· youtube

xLSTM: Extended Long Short-Term Memory· slideslive

Taxonomy

TopicsTopic Modeling

MethodsAttention Is All You Need · Sigmoid Activation · Tanh Activation · Dropout · Label Smoothing · Residual Connection · Long Short-Term Memory · Softmax · Position-Wise Feed-Forward Layer · Multiplicative LSTM