OV-COAST: Cost Aggregation with Optimal Transport for Open-Vocabulary Semantic Segmentation
Aditya Gandhamal, Aniruddh Sikdar, Suresh Sundaram

TL;DR
OV-COAST introduces a novel cost aggregation method using optimal transport to improve open-vocabulary semantic segmentation, enhancing out-of-domain generalization and outperforming existing models on the MESS benchmark.
Contribution
The paper proposes a new cost aggregation approach with optimal transport for OVSS, integrating cost volume and Sinkhorn distance to better align visual-language features.
Findings
Achieves 1.72% higher mIoU than CAT-Seg.
Surpasses SAN-B by 4.9% mIoU on MESS.
Improves out-of-domain generalization in OVSS.
Abstract
Open-vocabulary semantic segmentation (OVSS) entails assigning semantic labels to each pixel in an image using textual descriptions, typically leveraging world models such as CLIP. To enhance out-of-domain generalization, we propose Cost Aggregation with Optimal Transport (OV-COAST) for open-vocabulary semantic segmentation. To align visual-language features within the framework of optimal transport theory, we employ cost volume to construct a cost matrix, which quantifies the distance between two distributions. Our approach adopts a two-stage optimization strategy: in the first stage, the optimal transport problem is solved using cost volume via Sinkhorn distance to obtain an alignment solution; in the second stage, this solution is used to guide the training of the CAT-Seg model. We evaluate state-of-the-art OVSS models on the MESS benchmark, where our approach notably improves the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Natural Language Processing Techniques
