pLSTM: parallelizable Linear Source Transition Mark networks
Korbinian P\"oppel, Richard Freinschlag, Thomas Schmied, Wei Lin, Sepp Hochreiter

TL;DR
The paper introduces pLSTM, a parallelizable linear RNN architecture designed for processing complex data structures like DAGs and grids, enabling efficient long-range information propagation and outperforming Transformers on certain tasks.
Contribution
It extends linear RNNs to handle multi-dimensional and graph-structured data with parallelization, addressing long-distance dependencies effectively.
Findings
pLSTM generalizes well to larger images, outperforming Transformers.
pLSTM effectively handles long-range dependencies in DAGs.
Strong performance on molecular graph and computer vision benchmarks.
Abstract
Modern recurrent architectures, such as xLSTM and Mamba, have recently challenged the Transformer in language modeling. However, their structure constrains their applicability to sequences only or requires processing multi-dimensional data structures, such as images or molecular graphs, in a pre-defined sequential order. In contrast, Multi-Dimensional RNNs (MDRNNs) are well suited for data with a higher level structure, like 2D grids, trees, and directed acyclic graphs (DAGs). In this work, we extend the notion of multi-dimensionality to linear RNNs. We introduce parallelizable Linear Source Transition Mark networks (pLSTMs) using Source, Transition, and Mark gates that act on the line graph of a general DAG. This enables parallelization in analogy to parallel associative scans and the chunkwise-recurrent form of sequential linear RNNs, but for DAGs. For regular grids (1D and 2D), like…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsPower Systems and Technologies · Algorithms and Data Compression · Parallel Computing and Optimization Techniques
