Beyond-Labels: Advancing Open-Vocabulary Segmentation With Vision-Language Models
Muhammad Atta ur Rahman, Dooseop Choi, Seung-Ik Lee, KyoungWook Min

TL;DR
This paper introduces 'Beyond-Labels', a lightweight fusion module that enhances open-vocabulary semantic segmentation by leveraging pre-trained vision-language models and Fourier embeddings, achieving improved performance with minimal retraining.
Contribution
The study proposes a novel fusion module and positional encoding method that enable efficient adaptation of pre-trained models for open-vocabulary segmentation tasks.
Findings
Outperforms existing methods on PASCAL-5i benchmark
Uses minimal additional training data and computation
Improves generalization with Fourier positional embeddings
Abstract
Open-vocabulary semantic segmentation attempts to classify and outline objects in an image using arbitrary text labels, including those unseen during training. Self-supervised learning resolves numerous visual and linguistic processing problems when effectively trained. This study investigates simple yet efficient methods for adapting previously learned foundation models for open-vocabulary semantic segmentation tasks. Our research proposes "Beyond-Labels", a lightweight transformer-based fusion module that uses a small amount of image segmentation data to fuse frozen visual representations with language concepts. This strategy allows the model to leverage the extensive knowledge of pre-trained models without requiring significant retraining, making the approach data-efficient and scalable. Furthermore, we capture positional information in images using Fourier embeddings, improving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Multimodal Machine Learning Applications
