Can Transformers Learn to Verify During Backtracking Search?
Yin Jun Phua, Tony Ribeiro, Tuan Nguyen, Katsumi Inoue

TL;DR
This paper identifies limitations of transformer models in backtracking search tasks and proposes structural fixes to improve their decision consistency based on current state.
Contribution
It diagnoses specific issues in transformer reasoning with cumulative traces and introduces localization and Selective State Attention to address them.
Findings
SSA makes decisions consistent across same states with different histories
Localization improves the model's ability to focus on relevant state features
The proposed fixes outperform baseline causal models in verification tasks
Abstract
Backtracking search underlies classical constraint solvers, planners, and theorem provers. Recent transformer-based reasoning systems explore search trees over their own intermediate steps. A common training recipe fits an autoregressive next-token loss on offline solver traces. The model's input at each step is a cumulative trace of all prior decisions. The optimal continue-or-backtrack predictor depends only on the current search state, since two trajectories reaching the same state admit the same viable continuations. We show that decoder-only transformers trained on cumulative traces fail this requirement in two ways: the trace can scatter state features across many positions (scattered retrieval), and the predictor can condition on the trajectory rather than the state (history entanglement). We address scattered retrieval with localization, a trace-level fix that rewrites each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
