TL;DR
WISE introduces a structured reasoning paradigm that compresses detailed rationales into concise summaries, enabling efficient language segmentation with state-of-the-art zero-shot performance and significantly reduced reasoning length.
Contribution
The paper proposes WISE, a novel training and inference framework that compresses reasoning into concise rationales, improving efficiency without sacrificing accuracy.
Findings
Achieves state-of-the-art zero-shot performance on ReasonSeg with 58.3 cIoU.
Reduces reasoning length by nearly 5 times, from 112 to 23 tokens.
Demonstrates effective reasoning compression and robust inference strategy.
Abstract
Chain-of-thought (CoT) reasoning has significantly improved the performance of large multimodal models in language-guided segmentation, yet its prohibitive computational cost, stemming from generating verbose rationales, limits real-world applicability. We introduce WISE (Wisdom from Internal Self-Exploration), a novel paradigm for efficient reasoning guided by the principle of \textit{thinking twice -- once for learning, once for speed}. WISE trains a model to generate a structured sequence: a concise rationale, the final answer, and then a detailed explanation. By placing the concise rationale first, our method leverages autoregressive conditioning to enforce that the concise rationale acts as a sufficient summary for generating the detailed explanation. This structure is reinforced by a self-distillation objective that jointly rewards semantic fidelity and conciseness, compelling the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
