SENSE: Stereo OpEN Vocabulary SEmantic Segmentation

Thomas Campagnolo (ACENTAURI); Ezio Malis (ACENTAURI); Philippe Martinet (ACENTAURI); Ga\'etan Bahl

arXiv:2604.15946·cs.CV·April 20, 2026

SENSE: Stereo OpEN Vocabulary SEmantic Segmentation

Thomas Campagnolo (ACENTAURI), Ezio Malis (ACENTAURI), Philippe Martinet (ACENTAURI), Ga\'etan Bahl

PDF

TL;DR

SENSE introduces stereo vision and language models to improve open-vocabulary semantic segmentation, achieving higher accuracy and better spatial reasoning, especially in complex scenes with occlusions.

Contribution

It is the first to leverage stereo vision for open-vocabulary segmentation, enhancing spatial reasoning and generalization in zero-shot scenarios.

Findings

01

+2.9% AP on PhraseStereo over baseline

02

+0.76% AP over best competing method

03

+3.5% mIoU on Cityscapes, +18% on KITTI

Abstract

Open-vocabulary semantic segmentation enables models to segment objects or image regions beyond fixed class sets, offering flexibility in dynamic environments. However, existing methods often rely on single-view images and struggle with spatial precision, especially under occlusions and near object boundaries. We propose SENSE, the first work on Stereo OpEN Vocabulary SEmantic Segmentation, which leverages stereo vision and vision-language models to enhance open-vocabulary semantic segmentation. By incorporating stereo image pairs, we introduce geometric cues that improve spatial reasoning and segmentation accuracy. Trained on the PhraseStereo dataset, our approach achieves strong performance in phrase-grounded tasks and demonstrates generalization in zero-shot settings. On PhraseStereo, we show a +2.9% improvement in Average Precision over the baseline method and +0.76% over the best…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.