Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory
Nikola Zubi\'c, Federico Sold\'a, Aurelio Sulser, Davide Scaramuzza

TL;DR
This paper investigates the fundamental limitations of deep learning models, especially SSMs and Transformers, in complex reasoning and function composition tasks through theoretical proofs and empirical experiments, revealing significant performance barriers.
Contribution
It provides the first theoretical analysis showing the inefficiency of one-layer SSMs for complex function composition and empirically demonstrates these limitations on various reasoning tasks.
Findings
One-layer SSMs cannot efficiently perform large-scale function composition.
Transformers require many steps even with Chain-of-Thought prompting.
Models exhibit performance degradation and shortcuts in complex reasoning tasks.
Abstract
Despite their successes, deep learning models struggle with tasks requiring complex reasoning and function composition. We present a theoretical and empirical investigation into the limitations of Structured State Space Models (SSMs) and Transformers in such tasks. We prove that one-layer SSMs cannot efficiently perform function composition over large domains without impractically large state sizes, and even with Chain-of-Thought prompting, they require a number of steps that scale unfavorably with the complexity of the function composition. Also, the language of a finite-precision SSM is within the class of regular languages. Our experiments corroborate these theoretical findings. Evaluating models on tasks including various function composition settings, multi-digit multiplication, dynamic programming, and Einstein's puzzle, we find significant performance degradation even with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNeural Networks and Applications · Computability, Logic, AI Algorithms · Machine Learning and Algorithms
