Generative Prompt Model for Weakly Supervised Object Localization

Yuzhong Zhao; Qixiang Ye; Weijia Wu; Chunhua Shen; Fang Wan

arXiv:2307.09756·cs.CV·July 20, 2023

Generative Prompt Model for Weakly Supervised Object Localization

Yuzhong Zhao, Qixiang Ye, Weijia Wu, Chunhua Shen, Fang Wan

PDF

Open Access 1 Repo

TL;DR

This paper introduces GenPromp, a generative prompt model for weakly supervised object localization that effectively localizes entire objects, including less discriminative parts, by formulating WSOL as a conditional image denoising task.

Contribution

It is the first to propose a generative pipeline for WSOL, combining representative and discriminative embeddings to improve object localization accuracy.

Findings

01

Outperforms state-of-the-art discriminative models by over 5% in Top-1 localization accuracy.

02

Successfully localizes full object extent, including less discriminative parts.

03

Sets a new baseline for WSOL using a generative approach.

Abstract

Weakly supervised object localization (WSOL) remains challenging when learning object localization models from image category labels. Conventional methods that discriminatively train activation models ignore representative yet less discriminative object parts. In this study, we propose a generative prompt model (GenPromp), defining the first generative pipeline to localize less discriminative object parts by formulating WSOL as a conditional image denoising procedure. During training, GenPromp converts image category labels to learnable prompt embeddings which are fed to a generative model to conditionally recover the input image with noise and learn representative embeddings. During inference, enPromp combines the representative embeddings with discriminative embeddings (queried from an off-the-shelf vision-language model) for both representative and discriminative capacity. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

callsys/genpromp
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications