SteerSeg: Attention Steering for Reasoning Video Segmentation

Ali Cheraghian; Hamidreza Dastmalchi; Abdelwahed Khamis; Morteza Saberi; Aijun An; Lars Petersson

arXiv:2605.14908·cs.CV·May 15, 2026

SteerSeg: Attention Steering for Reasoning Video Segmentation

Ali Cheraghian, Hamidreza Dastmalchi, Abdelwahed Khamis, Morteza Saberi, Aijun An, Lars Petersson

PDF

1 Repo

TL;DR

SteerSeg enhances video object segmentation from natural language by steering attention maps through learnable prompts and reasoning-guided prompts, improving spatial grounding without retraining large models.

Contribution

It introduces a novel attention steering method using soft prompts and Chain-of-Thought prompting to improve spatial localization in video segmentation.

Findings

01

Significantly improves grounding accuracy on multiple benchmarks.

02

Maintains pretrained reasoning capabilities while enhancing spatial localization.

03

Generalizes well across diverse video segmentation datasets.

Abstract

Video reasoning segmentation requires localizing objects across video frames from natural language expressions, often involving spatial reasoning and implicit references. Recent approaches leverage frozen large vision-language models (LVLMs) by extracting attention maps and using them as spatial priors for segmentation, enabling training-free grounding. However, these attention maps are optimized for text generation rather than spatial localization, often resulting in diffuse and ambiguous grounding signals. In this work, we introduce SteerSeg, a lightweight framework that identifies attention misalignment as the key bottleneck in attention-based grounding and proposes to steer attention at its source through input-level conditioning. SteerSeg combines learnable soft prompts with reasoning-guided Chain-of-Thought (CoT) prompting. The soft prompts reshape the attention distribution to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://steerseg.github.io
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.