Dissecting Deep Metric Learning Losses for Image-Text Retrieval
Hong Xuan, Xi Chen

TL;DR
This paper introduces GOAL, a framework for analyzing and designing gradient-based objectives in deep metric learning for image-text retrieval, leading to improved performance and state-of-the-art results.
Contribution
The paper proposes a novel gradient analysis framework and new gradient-based objectives that enhance deep metric learning for image-text retrieval.
Findings
Consistently improved retrieval performance across various models.
Achieved state-of-the-art results on COCO and Flick30K datasets.
Demonstrated the generalizability of GOAL to different loss functions.
Abstract
Visual-Semantic Embedding (VSE) is a prevalent approach in image-text retrieval by learning a joint embedding space between the image and language modalities where semantic similarities would be preserved. The triplet loss with hard-negative mining has become the de-facto objective for most VSE methods. Inspired by recent progress in deep metric learning (DML) in the image domain which gives rise to new loss functions that outperform triplet loss, in this paper, we revisit the problem of finding better objectives for VSE in image-text matching. Despite some attempts in designing losses based on gradient movement, most DML losses are defined empirically in the embedding space. Instead of directly applying these loss functions which may lead to sub-optimal gradient updates in model parameters, in this paper we present a novel Gradient-based Objective AnaLysis framework, or \textit{GOAL},…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Dissecting Deep Metric Learning Losses for Image-Text Retrieval· youtube
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
MethodsTriplet Loss
