Learning to Align Generative Appearance Priors for Fine-grained Image Retrieval

Shijie Wang; Yadan Luo; Zijian Wang; Xin Yu; Zi Huang

arXiv:2605.09859·cs.CV·May 12, 2026

Learning to Align Generative Appearance Priors for Fine-grained Image Retrieval

Shijie Wang, Yadan Luo, Zijian Wang, Xin Yu, Zi Huang

PDF

TL;DR

GAPan introduces a generative model using normalizing flows to align image features with appearance priors, enhancing fine-grained image retrieval especially for unseen categories.

Contribution

It reformulates FGIR learning from category prediction to appearance modeling using invertible density models, improving generalization to unseen categories.

Findings

01

Achieves state-of-the-art results on fine- and coarse-grained benchmarks.

02

Effectively models intra-category appearance variation.

03

Improves retrieval performance on unseen categories.

Abstract

Fine-grained image retrieval (FGIR) typically relies on supervision from seen categories to learn discriminative embeddings for retrieving unseen categories. However, such supervision often biases retrieval models toward the semantics of seen categories rather than the underlying appearance characteristics that generalize across categories, thereby limiting retrieval performance on unseen categories. To tackle this, we propose GAPan, a Generative Appearance Prior alignment network that reformulates the learning objective from category prediction toward appearance modeling. Technically, GAPan treats retrieval features with an invertible density model based on normalizing flows. In the forward direction, the flow maps all instance features into a latent density space, where each seen category is modeled by a class-conditional Gaussian prior and optimized via exact likelihood estimation.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.