TL;DR
This paper introduces a novel ranking loss for scene graph-based image embeddings, leveraging relative similarity supervision to improve semantic image retrieval performance and capture global scene context.
Contribution
It proposes a new contrastive ranking loss with a triple sampling strategy for learning from relative similarity labels in scene graph embeddings.
Findings
Outperforms existing contrastive losses on retrieval tasks.
Produces embeddings that capture global scene context.
Demonstrates robustness in semantic image retrieval.
Abstract
Scene graphs are a powerful structured representation of the underlying content of images, and embeddings derived from them have been shown to be useful in multiple downstream tasks. In this work, we employ a graph convolutional network to exploit structure in scene graphs and produce image embeddings useful for semantic image retrieval. Different from classification-centric supervision traditionally available for learning image representations, we address the task of learning from relative similarity labels in a ranking context. Rooted within the contrastive learning paradigm, we propose a novel loss function that operates on pairs of similar and dissimilar images and imposes relative ordering between them in embedding space. We demonstrate that this Ranking loss, coupled with an intuitive triple sampling strategy, leads to robust representations that outperform well-known contrastive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
MethodsContrastive Learning
