Symmetric Multi-Similarity Loss for EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge 2024
Xiaoqi Wang, Yi Wang, Lap-Pui Chau

TL;DR
This paper introduces the Symmetric Multi-Similarity Loss for video-text retrieval, leveraging correlation matrices as soft labels, achieving state-of-the-art results in the EPIC-KITCHENS-100 challenge.
Contribution
The paper proposes a novel Symmetric Multi-Similarity Loss that better exploits correlation matrix information for multi-instance video-text retrieval.
Findings
Achieved 63.76% average mAP on the challenge
Achieved 74.25% average nDCG on the challenge
Demonstrated effectiveness of the proposed loss with ensemble learning
Abstract
In this report, we present our champion solution for EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge in CVPR 2024. Essentially, this challenge differs from traditional visual-text retrieval tasks by providing a correlation matrix that acts as a set of soft labels for video-text clip combinations. However, existing loss functions have not fully exploited this information. Motivated by this, we propose a novel loss function, Symmetric Multi-Similarity Loss, which offers a more precise learning objective. Together with tricks and ensemble learning, the model achieves 63.76% average mAP and 74.25% average nDCG on the public leaderboard, demonstrating the effectiveness of our approach. Our code will be released at: https://github.com/xqwang14/SMS-Loss/tree/main
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing and 3D Reconstruction
MethodsSparse Evolutionary Training · Contrastive Language-Image Pre-training
