TL;DR
This paper introduces a language guidance objective for deep metric learning, using language embeddings to improve semantic consistency and generalization of visual similarity spaces, achieving state-of-the-art results.
Contribution
It proposes a novel language guidance approach that incorporates language embeddings into deep metric learning to enhance semantic understanding and transferability.
Findings
Significant improvements across all benchmarks.
Model-agnostic approach effective for various DML methods.
Achieved state-of-the-art performance on multiple datasets.
Abstract
Deep Metric Learning (DML) proposes to learn metric spaces which encode semantic similarities as embedding space distances. These spaces should be transferable to classes beyond those seen during training. Commonly, DML methods task networks to solve contrastive ranking tasks defined over binary class assignments. However, such approaches ignore higher-level semantic relations between the actual classes. This causes learned embedding spaces to encode incomplete semantic context and misrepresent the semantic relation between classes, impacting the generalizability of the learned metric space. To tackle this issue, we propose a language guidance objective for visual similarity learning. Leveraging language embeddings of expert- and pseudo-classnames, we contextualize and realign visual representation spaces corresponding to meaningful language semantics for better semantic consistency.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
