Forest Before Trees: Latent Superposition for Efficient Visual Reasoning

Yubo Wang; Juntian Zhang; Yichen Wu; Yankai Lin; Nils Lukas; Yuhan Liu

arXiv:2601.06803·cs.CL·April 21, 2026

Forest Before Trees: Latent Superposition for Efficient Visual Reasoning

Yubo Wang, Juntian Zhang, Yichen Wu, Yankai Lin, Nils Lukas, Yuhan Liu

PDF

1 Repo

TL;DR

Laser introduces a dynamic alignment approach for visual reasoning that maintains global features before local details, achieving state-of-the-art results efficiently across multiple benchmarks.

Contribution

The paper presents Laser, a novel latent reasoning paradigm with Dynamic Windowed Alignment Learning that improves interpretability and efficiency in visual deduction tasks.

Findings

01

Laser surpasses Monet by 5.03% on average across 6 benchmarks.

02

Reduces inference tokens by over 97%, enhancing efficiency.

03

Demonstrates robust out-of-distribution generalization.

Abstract

While Chain-of-Thought empowers Large Vision-Language Models with multi-step reasoning, explicit textual rationales suffer from an information bandwidth bottleneck, where continuous visual details are discarded during discrete tokenization. Recent latent reasoning methods attempt to address this challenge, but often fall prey to premature semantic collapse due to rigid autoregressive objectives. In this paper, we propose Laser, a novel paradigm that reformulates visual deduction via Dynamic Windowed Alignment Learning (DWAL). Instead of forcing a point-wise prediction, Laser aligns the latent state with a dynamic validity window of future semantics. This mechanism enforces a "Forest-before-Trees" cognitive hierarchy, enabling the model to maintain a probabilistic superposition of global features before narrowing down to local details. Crucially, Laser maintains interpretability via…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ybb6/laser
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.