InvGC: Robust Cross-Modal Retrieval by Inverse Graph Convolution
Xiangru Jian, Yimu Wang

TL;DR
This paper identifies the representation degeneration problem in cross-modal retrieval, introduces InvGC, a graph convolution-based post-processing method to enhance representation separation, and demonstrates its effectiveness through extensive experiments.
Contribution
The paper empirically validates the representation degeneration problem and proposes InvGC with LocalAdj to improve cross-modal retrieval performance.
Findings
InvGC significantly mitigates representation degeneration.
InvGC improves retrieval accuracy across multiple benchmarks.
Theoretical analysis confirms recall bounds are enhanced by InvGC.
Abstract
Over recent decades, significant advancements in cross-modal retrieval are mainly driven by breakthroughs in visual and linguistic modeling. However, a recent study shows that multi-modal data representations tend to cluster within a limited convex cone (as representation degeneration problem), which hinders retrieval performance due to the inseparability of these representations. In our study, we first empirically validate the presence of the representation degeneration problem across multiple cross-modal benchmarks and methods. Next, to address it, we introduce a novel method, called InvGC, a post-processing technique inspired by graph convolution and average pooling. Specifically, InvGC defines the graph topology within the datasets and then applies graph convolution in a subtractive manner. This method effectively separates representations by increasing the distances between data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
MethodsConvolution
