Efficient Redundancy Reduction for Open-Vocabulary Semantic Segmentation
Lin Chen, Qi Yang, Kun Ding, Zhihao Li, Gang Shen, Fei Li, Qiyuan Cao, and Shiming Xiang

TL;DR
ERR-Seg introduces a lightweight, efficient approach for open-vocabulary semantic segmentation by reducing redundancy in cost maps and aggregation, significantly improving speed and memory efficiency while maintaining high accuracy.
Contribution
The paper proposes ERR-Seg, a novel architecture that reduces redundancy in cost maps and aggregation, leading to faster and more memory-efficient open-vocabulary semantic segmentation.
Findings
Achieves 5.6% performance improvement on ADE20K-847 benchmark.
Realizes a 3.1x speedup over previous methods.
Maintains accuracy while significantly reducing computational costs.
Abstract
Open-vocabulary semantic segmentation (OVSS) is an open-world task that aims to assign each pixel within an image to a specific class defined by arbitrary text descriptions. While large-scale vision-language models have shown remarkable open-vocabulary capabilities, their image-level pretraining limits effectiveness on pixel-wise dense prediction tasks like OVSS. Recent cost-based methods narrow this granularity gap by constructing pixel-text cost maps and refining them via cost aggregation mechanisms. Despite achieving promising performance, these approaches suffer from high computational costs and long inference latency. In this paper, we identify two major sources of redundancy in the cost-based OVSS framework: redundant information introduced during cost maps construction and inefficient sequence modeling in cost aggregation. To address these issues, we propose ERR-Seg, an efficient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsContrastive Language-Image Pre-training
