PCA-Seg: Revisiting Cost Aggregation for Open-Vocabulary Semantic and Part Segmentation
Jianjian Yin, Tao Chen, Yi Chen, Gensheng Pei, Xiangbo Shu, Yazhou Yao, Fumin Shen

TL;DR
PCA-Seg introduces a parallel cost aggregation framework with an expert-driven perceptual learning module and feature orthogonalization to improve open-vocabulary segmentation, achieving state-of-the-art results with minimal additional parameters.
Contribution
It proposes a novel parallel cost aggregation paradigm with an EPL module and FOD strategy to better capture vision-language alignment in OSPS tasks.
Findings
Achieves state-of-the-art performance on eight benchmarks.
Adds only 0.35M parameters per parallel block.
Effectively integrates semantic and contextual features.
Abstract
Recent advances in vision-language models (VLMs) have garnered substantial attention in open-vocabulary semantic and part segmentation (OSPS). However, existing methods extract image-text alignment cues from cost volumes through a serial structure of spatial and class aggregations, leading to knowledge interference between class-level semantics and spatial context. Therefore, this paper proposes a simple yet effective parallel cost aggregation (PCA-Seg) paradigm to alleviate the above challenge, enabling the model to capture richer vision-language alignment information from cost volumes. Specifically, we design an expert-driven perceptual learning (EPL) module that efficiently integrates semantic and contextual streams. It incorporates a multi-expert parser to extract complementary features from multiple perspectives. In addition, a coefficient mapper is designed to adaptively learn…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Natural Language Processing Techniques
