Retrieval-based Disentangled Representation Learning with Natural Language Supervision
Jiawei Zhou, Xiaoguang Li, Lifeng Shang, Xin Jiang, Qun Liu, Lei Chen

TL;DR
This paper introduces VDR, a retrieval-based framework that uses natural language descriptions to learn disentangled data representations, improving retrieval performance and interpretability across multiple datasets.
Contribution
VDR leverages natural language as a proxy for data variation, enabling disentangled representation learning through a bi-encoder model in a vocabulary space, with extensive benchmark evaluation.
Findings
Achieves 8.7% improvement in NDCG@10 on BEIR
Outperforms previous methods in mean recall on MS COCO and Flickr30k
Human evaluation shows interpretability comparable to SOTA captioning models
Abstract
Disentangled representation learning remains challenging as the underlying factors of variation in the data do not naturally exist. The inherent complexity of real-world data makes it unfeasible to exhaustively enumerate and encapsulate all its variations within a finite set of factors. However, it is worth noting that most real-world data have linguistic equivalents, typically in the form of textual descriptions. These linguistic counterparts can represent the data and effortlessly decomposed into distinct tokens. In light of this, we present Vocabulary Disentangled Retrieval (VDR), a retrieval-based framework that harnesses natural language as proxies of the underlying data variation to drive disentangled representation learning. Our approach employ a bi-encoder model to represent both data and natural language in a vocabulary space, enabling the model to distinguish dimensions that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsBalanced Selection
