Unbiased Region-Language Alignment for Open-Vocabulary Dense Prediction

Yunheng Li; Yuxuan Li; Quansheng Zeng; Wenhai Wang; Qibin Hou; Ming-Ming Cheng

arXiv:2412.06244·cs.CV·December 25, 2025

Unbiased Region-Language Alignment for Open-Vocabulary Dense Prediction

Yunheng Li, Yuxuan Li, Quansheng Zeng, Wenhai Wang, Qibin Hou, Ming-Ming Cheng

PDF

Open Access 1 Models

TL;DR

DenseVLM introduces an unbiased region-language alignment framework that enhances open-vocabulary dense prediction tasks by reducing foreground bias and leveraging pre-trained vision-language models for improved zero-shot performance.

Contribution

The paper proposes DenseVLM, a novel framework that learns unbiased region-language alignment from pre-trained VLMs, improving dense prediction tasks and zero-shot scalability.

Findings

01

Significant performance improvements in object detection and segmentation.

02

Effective reduction of foreground bias in dense prediction.

03

Enhanced zero-shot generalization on diverse datasets.

Abstract

Pre-trained vision-language models (VLMs), such as CLIP, have demonstrated impressive zero-shot recognition capability, but still underperform in dense prediction tasks. Self-distillation recently is emerging as a promising approach for fine-tuning VLMs to better adapt to local regions without requiring extensive annotations. However, previous state-of-the-art approaches often suffer from significant `foreground bias', where models tend to wrongly identify background regions as foreground objects. To alleviate this issue, we propose DenseVLM, a framework designed to learn unbiased region-language alignment from powerful pre-trained VLM representations. To alleviate this issue, we propose DenseVLM, a framework designed to learn unbiased region-language alignment from powerful pre-trained VLM representations. DenseVLM leverages the pre-trained VLM to retrieve categories for unlabeled…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
lyhisme/DenseVLM
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification

MethodsContrastive Language-Image Pre-training