Learn to Understand Negation in Video Retrieval

Ziyue Wang; Aozhu Chen; Fan Hu; Xirong Li

arXiv:2205.00132·cs.MM·July 14, 2022

Learn to Understand Negation in Video Retrieval

Ziyue Wang, Aozhu Chen, Fan Hu, Xirong Li

PDF

1 Repo

TL;DR

This paper introduces a novel method for training video retrieval models to understand negation in natural language queries, improving their ability to handle negated descriptions and enhancing overall retrieval performance.

Contribution

It re-purposes existing datasets to evaluate negation understanding and proposes a training method that incorporates negation-aware loss, advancing video retrieval capabilities.

Findings

01

Improved retrieval accuracy on negation queries.

02

Enhanced overall performance on standard benchmarks.

03

Effective use of partially negated captions for training.

Abstract

Negation is a common linguistic skill that allows human to express what we do NOT want. Naturally, one might expect video retrieval to support natural-language queries with negation, e.g., finding shots of kids sitting on the floor and not playing with a dog. However, the state-of-the-art deep learning based video retrieval models lack such ability, as they are typically trained on video description datasets such as MSR-VTT and VATEX that lack negated descriptions. Their retrieved results basically ignore the negator in the sample query, incorrectly returning videos showing kids playing with dog. This paper presents the first study on learning to understand negation in video retrieval and make contributions as follows. By re-purposing two existing datasets (MSR-VTT and VATEX), we propose a new evaluation protocol for video retrieval with negation. We propose a learning based method for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ruc-aimc-lab/nt2vr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsContrastive Language-Image Pre-training