Exploiting Semantic Role Contextualized Video Features for   Multi-Instance Text-Video Retrieval EPIC-KITCHENS-100 Multi-Instance   Retrieval Challenge 2022

Burak Satar; Hongyuan Zhu; Hanwang Zhang; Joo Hwee Lim

arXiv:2206.14381·cs.CV·September 27, 2023·1 cites

Exploiting Semantic Role Contextualized Video Features for Multi-Instance Text-Video Retrieval EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge 2022

Burak Satar, Hongyuan Zhu, Hanwang Zhang, Joo Hwee Lim

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel method for multi-instance text-video retrieval that leverages semantic role parsing and self-attention mechanisms to improve semantic similarity matching, achieving top rankings in the EPIC-KITCHENS-100 challenge.

Contribution

It proposes a new approach combining semantic role contextualization and self-attention for enhanced video-text retrieval performance.

Findings

01

Outperforms baseline in nDCG metric

02

Achieves 3rd place in nDCG ranking

03

Achieves 4th place in mAP ranking

Abstract

In this report, we present our approach for EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge 2022. We first parse sentences into semantic roles corresponding to verbs and nouns; then utilize self-attentions to exploit semantic role contextualized video features along with textual features via triplet losses in multiple embedding spaces. Our method overpasses the strong baseline in normalized Discounted Cumulative Gain (nDCG), which is more valuable for semantic similarity. Our submission is ranked 3rd for nDCG and ranked 4th for mAP.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

buraksatar/RoME_video_retrieval
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning