StAR: Segment Anything Reasoner
Seokju Yun, Dongheon Lee, Noori Bae, Jaesung Jun, Chanseul Cho, Youngmin Ro

TL;DR
StAR is a novel framework that enhances visual reasoning in AI by refining design choices, introducing parallel test-time scaling, and establishing a new benchmark dataset, significantly improving reasoning performance with limited training data.
Contribution
This work introduces StAR, a comprehensive reasoning framework with innovative training and testing strategies, and a new dataset for systematic evaluation of visual reasoning methods.
Findings
StAR achieves significant performance improvements over baseline models.
Parallel test-time scaling further boosts segmentation accuracy.
The approach activates latent reasoning capabilities with only 5k training samples.
Abstract
As AI systems are being integrated more rapidly into diverse and complex real-world environments, the ability to perform holistic reasoning over an implicit query and an image to localize a target is becoming increasingly important. However, recent reasoning segmentation methods fail to sufficiently elicit the visual reasoning capabilities of the base mode. In this work, we present Segment Anything Reasoner (StAR), a comprehensive framework that refines the design space from multiple perspectives-including parameter-tuning scheme, reward functions, learning strategies and answer format-and achieves substantial improvements over recent baselines. In addition, for the first time, we successfully introduce parallel test-time scaling to the segmentation task, pushing the performance boundary even further. To extend the scope and depth of reasoning covered by existing benchmark, we also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Advanced Graph Neural Networks
