Reducing Predictive Feature Suppression in Resource-Constrained Contrastive Image-Caption Retrieval
Maurits Bleeker, Andrew Yates, Maarten de Rijke

TL;DR
This paper introduces latent target decoding (LTD), a novel approach that reduces predictive feature suppression in resource-constrained contrastive image-caption retrieval models by reconstructing captions in a latent space, improving retrieval performance.
Contribution
LTD adds a decoder to contrastive ICR models to prevent feature suppression without needing extra data or complex negative mining, applicable across various losses and models.
Findings
LTD improves recall@k, r-precision, and nDCG scores over baseline.
Implementing LTD as a constraint is more effective than as a dual objective.
LTD is compatible with different contrastive losses and resource-constrained methods.
Abstract
To train image-caption retrieval (ICR) methods, contrastive loss functions are a common choice for optimization functions. Unfortunately, contrastive ICR methods are vulnerable to predictive feature suppression. Predictive features are features that correctly indicate the similarity between a query and a candidate item. However, in the presence of multiple predictive features during training, encoder models tend to suppress redundant predictive features, since these features are not needed to learn to discriminate between positive and negative pairs. While some predictive features are redundant during training, these features might be relevant during evaluation. We introduce an approach to reduce predictive feature suppression for resource-constrained ICR methods: latent target decoding (LTD). We add an additional decoder to the contrastive ICR framework, to reconstruct the input…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsContrastive Learning
