When to Re-Commit: Temporal Abstraction Discovery for Long-Horizon Vision-Language Reasoning
Chen Li, Zhantao Yang, Fangyi Chen, Han Zhang, Anudeepsekhar Bolimera, Marios Savvides

TL;DR
This paper introduces a learnable, state-conditioned commitment depth for long-horizon vision-language reasoning, improving efficiency and success rates over fixed-depth methods.
Contribution
It formalizes commitment depth as a dynamic, learnable policy component and demonstrates its effectiveness in vision-language tasks.
Findings
Adaptive policy outperforms fixed-depth baselines by up to 12.5% in solve rate.
Uses 25% fewer primitive actions per episode.
Outperforms GPT-5.5 and Claude Sonnet on tested tasks.
Abstract
Long-horizon reasoning requires deciding not only what actions to take, but how deeply to commit before the next observation. We formalize this as \emph{commitment depth}: the number of primitive actions executed open-loop between replans. Commitment depth induces a trade-off between replanning cost and compounding execution error, yet most existing long-horizon systems fix it as a hand-designed scalar. In this work, we instead treat commitment depth as a learnable, state-conditioned variable of the policy itself. We instantiate this within a model-native vision--language policy that jointly predicts both what to execute and for how long. Across Sliding Puzzle and Sokoban, the resulting adaptive policy Pareto-dominates every non-degenerate fixed-depth baseline, achieving up to 12.5 percentage points higher solve rate while using approximately 25\% fewer primitive actions per episode.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
