Boule or Baguette? A Study on Task Topology, Length Generalization, and the Benefit of Reasoning Traces

William L. Tong; Ege Cakar; Cengiz Pehlevan

arXiv:2602.14404·cs.AI·February 17, 2026

Boule or Baguette? A Study on Task Topology, Length Generalization, and the Benefit of Reasoning Traces

William L. Tong, Ege Cakar, Cengiz Pehlevan

PDF

Open Access 1 Datasets

TL;DR

This paper introduces PITA, a large dataset for propositional logic reasoning, and investigates how reasoning traces affect models' ability to generalize to longer proofs, revealing strengths in broad tasks and limitations in deep ones.

Contribution

The paper presents PITA, a new large-scale reasoning dataset, and analyzes the impact of reasoning traces on length generalization, highlighting their benefits and limitations.

Findings

01

RT models excel on broad, shallow tasks

02

RT models struggle with narrow, deep tasks

03

Generalization performance depends on task breadth and depth

Abstract

Recent years have witnessed meteoric progress in reasoning models: neural networks that generate intermediate reasoning traces (RTs) before producing a final output. Despite the rapid advancement, our understanding of how RTs support reasoning, and the limits of this paradigm, remain incomplete. To promote greater clarity, we introduce PITA: a novel large-scale dataset of over 23 million statements in propositional logic and their corresponding proofs. As a benchmark for robust reasoning, we focus on length generalization: if a model is trained to determine truth or falsity on statements with proofs up to fixed length, how well does it generalize to statements requiring longer proofs? We propose notions of (1) task depth and (2) task breadth, which measure respectively (1) the number of steps required to solve an example from a task and (2) the number of unique examples across a task.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

williamtong105/pita
dataset· 232 dl
232 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Constraint Satisfaction and Optimization