When to Re-Commit: Temporal Abstraction Discovery for Long-Horizon Vision-Language Reasoning

Chen Li; Zhantao Yang; Fangyi Chen; Han Zhang; Anudeepsekhar Bolimera; Marios Savvides

arXiv:2605.09860·cs.AI·May 21, 2026

When to Re-Commit: Temporal Abstraction Discovery for Long-Horizon Vision-Language Reasoning

Chen Li, Zhantao Yang, Fangyi Chen, Han Zhang, Anudeepsekhar Bolimera, Marios Savvides

PDF

TL;DR

This paper introduces a learnable, state-conditioned commitment depth for long-horizon vision-language reasoning, improving efficiency and success rates over fixed-depth methods.

Contribution

It formalizes commitment depth as a dynamic, learnable policy component and demonstrates its effectiveness in vision-language tasks.

Findings

01

Adaptive policy outperforms fixed-depth baselines by up to 12.5% in solve rate.

02

Uses 25% fewer primitive actions per episode.

03

Outperforms GPT-5.5 and Claude Sonnet on tested tasks.

Abstract

Long-horizon reasoning requires deciding not only what actions to take, but how deeply to commit before the next observation. We formalize this as \emph{commitment depth}: the number of primitive actions executed open-loop between replans. Commitment depth induces a trade-off between replanning cost and compounding execution error, yet most existing long-horizon systems fix it as a hand-designed scalar. In this work, we instead treat commitment depth as a learnable, state-conditioned variable of the policy itself. We instantiate this within a model-native vision--language policy that jointly predicts both what to execute and for how long. Across Sliding Puzzle and Sokoban, the resulting adaptive policy Pareto-dominates every non-degenerate fixed-depth baseline, achieving up to 12.5 percentage points higher solve rate while using approximately 25\% fewer primitive actions per episode.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.