Text-promptable Object Counting via Quantity Awareness Enhancement
Miaojing Shi, Xiaowen Zhang, Zijie Yue, Yong Luo, Cairong Zhao, Li Li

TL;DR
QUANet enhances large vision-language models for object counting by introducing quantity-aware prompts, a dual-stream decoder with knowledge sharing, and a ranking loss, achieving strong zero-shot generalization on multiple benchmarks.
Contribution
The paper proposes QUANet, a novel framework with quantity-oriented prompts and a dual-stream decoder, improving object counting accuracy and generalizability in zero-shot scenarios.
Findings
Strong zero-shot counting performance on benchmarks
Effective knowledge sharing between Transformer and CNN streams
Improved quantity awareness through novel prompts and losses
Abstract
Recent advances in large vision-language models (VLMs) have shown remarkable progress in solving the text-promptable object counting problem. Representative methods typically specify text prompts with object category information in images. This however is insufficient for training the model to accurately distinguish the number of objects in the counting task. To this end, we propose QUANet, which introduces novel quantity-oriented text prompts with a vision-text quantity alignment loss to enhance the model's quantity awareness. Moreover, we propose a dual-stream adaptive counting decoder consisting of a Transformer stream, a CNN stream, and a number of Transformer-to-CNN enhancement adapters (T2C-adapters) for density map prediction. The T2C-adapters facilitate the effective knowledge communication and aggregation between the Transformer and CNN streams. A cross-stream quantity ranking…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Digital Imaging for Blood Diseases · Advanced Neural Network Applications
