Efficient Redundancy Reduction for Open-Vocabulary Semantic Segmentation

Lin Chen; Qi Yang; Kun Ding; Zhihao Li; Gang Shen; Fei Li; Qiyuan Cao; and Shiming Xiang

arXiv:2501.17642·cs.CV·December 23, 2025

Efficient Redundancy Reduction for Open-Vocabulary Semantic Segmentation

Lin Chen, Qi Yang, Kun Ding, Zhihao Li, Gang Shen, Fei Li, Qiyuan Cao, and Shiming Xiang

PDF

Open Access 1 Repo

TL;DR

ERR-Seg introduces a lightweight, efficient approach for open-vocabulary semantic segmentation by reducing redundancy in cost maps and aggregation, significantly improving speed and memory efficiency while maintaining high accuracy.

Contribution

The paper proposes ERR-Seg, a novel architecture that reduces redundancy in cost maps and aggregation, leading to faster and more memory-efficient open-vocabulary semantic segmentation.

Findings

01

Achieves 5.6% performance improvement on ADE20K-847 benchmark.

02

Realizes a 3.1x speedup over previous methods.

03

Maintains accuracy while significantly reducing computational costs.

Abstract

Open-vocabulary semantic segmentation (OVSS) is an open-world task that aims to assign each pixel within an image to a specific class defined by arbitrary text descriptions. While large-scale vision-language models have shown remarkable open-vocabulary capabilities, their image-level pretraining limits effectiveness on pixel-wise dense prediction tasks like OVSS. Recent cost-based methods narrow this granularity gap by constructing pixel-text cost maps and refining them via cost aggregation mechanisms. Despite achieving promising performance, these approaches suffer from high computational costs and long inference latency. In this paper, we identify two major sources of redundancy in the cost-based OVSS framework: redundant information introduced during cost maps construction and inefficient sequence modeling in cost aggregation. To address these issues, we propose ERR-Seg, an efficient…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lchen1019/ERR-Seg
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsContrastive Language-Image Pre-training