Do Lessons from Metric Learning Generalize to Image-Caption Retrieval?
Maurits Bleeker, Maarten de Rijke

TL;DR
This paper investigates whether recent metric learning loss functions outperform the traditional triplet loss in image-caption retrieval, finding that the triplet loss with semi-hard negatives remains superior due to its selective gradient computation.
Contribution
The paper introduces an analysis method to compare loss functions based on sample contribution to gradients, revealing why triplet loss with semi-hard negatives outperforms newer metric learning losses in ICR.
Findings
Triplet loss with semi-hard negatives outperforms newer metric learning losses in ICR.
Loss functions with lower evaluation scores consider too many non-informative samples.
Selective gradient computation with hard negatives leads to better retrieval performance.
Abstract
The triplet loss with semi-hard negatives has become the de facto choice for image-caption retrieval (ICR) methods that are optimized from scratch. Recent progress in metric learning has given rise to new loss functions that outperform the triplet loss on tasks such as image retrieval and representation learning. We ask whether these findings generalize to the setting of ICR by comparing three loss functions on two ICR methods. We answer this question negatively: the triplet loss with semi-hard negative mining still outperforms newly introduced loss functions from metric learning on the ICR task. To gain a better understanding of these outcomes, we introduce an analysis method to compare loss functions by counting how many samples contribute to the gradient w.r.t. the query representation during optimization. We find that loss functions that result in lower evaluation scores on the ICR…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsTriplet Loss
