Generative Prompt Model for Weakly Supervised Object Localization
Yuzhong Zhao, Qixiang Ye, Weijia Wu, Chunhua Shen, Fang Wan

TL;DR
This paper introduces GenPromp, a generative prompt model for weakly supervised object localization that effectively localizes entire objects, including less discriminative parts, by formulating WSOL as a conditional image denoising task.
Contribution
It is the first to propose a generative pipeline for WSOL, combining representative and discriminative embeddings to improve object localization accuracy.
Findings
Outperforms state-of-the-art discriminative models by over 5% in Top-1 localization accuracy.
Successfully localizes full object extent, including less discriminative parts.
Sets a new baseline for WSOL using a generative approach.
Abstract
Weakly supervised object localization (WSOL) remains challenging when learning object localization models from image category labels. Conventional methods that discriminatively train activation models ignore representative yet less discriminative object parts. In this study, we propose a generative prompt model (GenPromp), defining the first generative pipeline to localize less discriminative object parts by formulating WSOL as a conditional image denoising procedure. During training, GenPromp converts image category labels to learnable prompt embeddings which are fed to a generative model to conditionally recover the input image with noise and learn representative embeddings. During inference, enPromp combines the representative embeddings with discriminative embeddings (queried from an off-the-shelf vision-language model) for both representative and discriminative capacity. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
