A Feature-space Multimodal Data Augmentation Technique for Text-video   Retrieval

Alex Falcon; Giuseppe Serra; Oswald Lanz

arXiv:2208.02080·cs.CV·August 4, 2022

A Feature-space Multimodal Data Augmentation Technique for Text-video Retrieval

Alex Falcon, Giuseppe Serra, Oswald Lanz

PDF

1 Repo

TL;DR

This paper introduces a feature-space multimodal data augmentation method for text-video retrieval that enhances performance by generating new samples through semantic mixing, avoiding raw data transformations and addressing copyright issues.

Contribution

The paper presents a novel feature-space augmentation technique for text-video retrieval that improves accuracy without relying on resource-intensive raw data modifications.

Findings

01

Significant performance improvements on EPIC-Kitchens-100 dataset

02

Achieved state-of-the-art results with the proposed method

03

Conducted extensive ablation studies confirming effectiveness

Abstract

Every hour, huge amounts of visual contents are posted on social media and user-generated content platforms. To find relevant videos by means of a natural language query, text-video retrieval methods have received increased attention over the past few years. Data augmentation techniques were introduced to increase the performance on unseen test examples by creating new training samples with the application of semantics-preserving techniques, such as color space or geometric transformations on images. Yet, these techniques are usually applied on raw data, leading to more resource-demanding solutions and also requiring the shareability of the raw data, which may not always be true, e.g. copyright issues with clips from movies or TV series. To address this shortcoming, we propose a multimodal data augmentation technique which works in the feature space and creates new videos and captions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aranciokov/fsmmda_videoretrieval
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsTest