Rudder: A Cross Lingual Video and Text Retrieval Dataset

Jayaprakash A; Abhishek; Rishabh Dabral; Ganesh Ramakrishnan; Preethi; Jyothi

arXiv:2103.05457·cs.IR·March 10, 2021·1 cites

Rudder: A Cross Lingual Video and Text Retrieval Dataset

Jayaprakash A, Abhishek, Rishabh Dabral, Ganesh Ramakrishnan, Preethi, Jyothi

PDF

Open Access 1 Repo

TL;DR

This paper introduces Rudder, a multilingual video-text retrieval dataset, and proposes a partial order loss to improve joint embeddings, especially in data-scarce multilingual settings, outperforming traditional loss functions.

Contribution

The paper presents Rudder, a new multilingual dataset for video-text retrieval, and introduces a partial order loss that enhances embedding quality in low-data scenarios.

Findings

01

Partial order loss outperforms max-margin and triplet losses.

02

Significant improvements in retrieval performance on MSR-VTT and DiDeMO.

03

Cross-lingual training enhances retrieval accuracy across languages.

Abstract

Video retrieval using natural language queries requires learning semantically meaningful joint embeddings between the text and the audio-visual input. Often, such joint embeddings are learnt using pairwise (or triplet) contrastive loss objectives which cannot give enough attention to 'difficult-to-retrieve' samples during training. This problem is especially pronounced in data-scarce settings where the data is relatively small (10% of the large scale MSR-VTT) to cover the rather complex audio-visual embedding space. In this context, we introduce Rudder - a multilingual video-text retrieval dataset that includes audio and textual captions in Marathi, Hindi, Tamil, Kannada, Malayalam and Telugu. Furthermore, we propose to compensate for data scarcity by using domain knowledge to augment supervision. To this end, in addition to the conventional three samples of a triplet (anchor, positive,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nshubham655/RUDDER
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Music and Audio Processing · Human Pose and Action Recognition