Exploring Depth Generalization in Large Language Models for Solving Recursive Logic Tasks
Zhiyuan He

TL;DR
This paper investigates the limitations of large language models in handling recursive reasoning tasks with increasing depth, identifies architectural constraints, and proposes a novel pipeline to improve depth generalization across multiple logical domains.
Contribution
The study reveals the depth generalization challenge in transformer models and introduces a looped locate-and-replace pipeline with specialized models to enhance recursive problem solving.
Findings
Transformer models struggle with deeper recursion than seen during training.
The proposed pipeline improves out-of-distribution depth generalization.
Method is effective across Boolean algebra, recursive arithmetic, and propositional logic.
Abstract
Large language models have demonstrated remarkable capabilities across many tasks, yet face significant challenges when dealing with recursive reasoning problems, those requiring the resolution of nested hierarchical structures. While prior research has extensively studied length generalization (a model's ability to handle longer sequences than seen during training), we investigate a distinct and underexplored limitation: depth generalization. Here, depth refers to the number of nested levels in a hierarchical problem, such as the layers of parentheses in a mathematical expression or the nesting of logical clauses in a Boolean formula. Our work reveals that standard transformer architectures struggle with problems involving deeper recursion than encountered during training, even when they perform well on longer but non-nested sequences. This limitation stems from their inability to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
