MTVR: Multilingual Moment Retrieval in Videos

Jie Lei; Tamara L. Berg; Mohit Bansal

arXiv:2108.00061·cs.CL·August 3, 2021

MTVR: Multilingual Moment Retrieval in Videos

Jie Lei, Tamara L. Berg, Mohit Bansal

PDF

Open Access 1 Repo

TL;DR

This paper introduces MTVR, a large-scale multilingual video moment retrieval dataset with English and Chinese queries, and proposes mXML, a model that effectively handles multilingual data, outperforming monolingual baselines.

Contribution

The paper presents MTVR, a new multilingual dataset for video moment retrieval, and mXML, a novel multilingual retrieval model leveraging shared encoders and language constraints.

Findings

01

mXML outperforms monolingual baselines on MTVR

02

MTVR is larger and more diverse than existing datasets

03

mXML achieves comparable or better accuracy with fewer parameters

Abstract

We introduce mTVR, a large-scale multilingual video moment retrieval dataset, containing 218K English and Chinese queries from 21.8K TV show video clips. The dataset is collected by extending the popular TVR dataset (in English) with paired Chinese queries and subtitles. Compared to existing moment retrieval datasets, mTVR is multilingual, larger, and comes with diverse annotations. We further propose mXML, a multilingual moment retrieval model that learns and operates on data from both languages, via encoder parameter sharing and language neighborhood constraints. We demonstrate the effectiveness of mXML on the newly collected MTVR dataset, where mXML outperforms strong monolingual baselines while using fewer parameters. In addition, we also provide detailed dataset analyses and model ablations. Data and code are publicly available at https://github.com/jayleicn/mTVRetrieval

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jayleicn/mTVRetrieval
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques