Learning Disentangled Latent Factors from Paired Data in Cross-Modal Retrieval: An Implicit Identifiable VAE Approach
Minyoung Kim, Ricardo Guerrero, Vladimir Pavlovic

TL;DR
This paper introduces an implicit decoder approach in a variational autoencoder framework to learn disentangled latent factors in cross-modal data, improving factor identification accuracy without relying on explicit data decoding.
Contribution
It proposes an implicit decoder method with Jacobian regularization and extends the Identifiable VAE to incorporate query modality data for better factor disentanglement.
Findings
Accurately identifies true latent factors in various datasets.
Outperforms conventional encoder-decoder models in factor disentanglement.
Learned factors align with known domain-specific features in food data.
Abstract
We deal with the problem of learning the underlying disentangled latent factors that are shared between the paired bi-modal data in cross-modal retrieval. Our assumption is that the data in both modalities are complex, structured, and high dimensional (e.g., image and text), for which the conventional deep auto-encoding latent variable models such as the Variational Autoencoder (VAE) often suffer from difficulty of accurate decoder training or realistic synthesis. A suboptimally trained decoder can potentially harm the model's capability of identifying the true factors. In this paper we propose a novel idea of the implicit decoder, which completely removes the ambient data decoding module from a latent variable model, via implicit encoder inversion that is achieved by Jacobian regularization of the low-dimensional embedding function. Motivated from the recent Identifiable VAE (IVAE)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning · Image Retrieval and Classification Techniques
MethodsUSD Coin Customer Service Number +1-833-534-1729 · Solana Customer Service Number +1-833-534-1729
