Self-Contained Entity Discovery from Captioned Videos

Melika Ayoughi; Pascal Mettes; Paul Groth

arXiv:2208.06662·cs.CV·August 16, 2022

Self-Contained Entity Discovery from Captioned Videos

Melika Ayoughi, Pascal Mettes, Paul Groth

PDF

Open Access 1 Repo

TL;DR

This paper presents a novel method for discovering named entities in videos using only video content and captions, eliminating the need for external knowledge or annotations, and introduces new benchmarks for evaluation.

Contribution

The work proposes a three-stage, self-contained approach for entity discovery in videos from multimodal data, along with new benchmarks based on popular TV series.

Findings

01

Achieves entity recognition accuracy close to supervised methods.

02

Demonstrates effectiveness on new benchmarks derived from TV series.

03

Highlights challenges and future directions for self-contained visual entity discovery.

Abstract

This paper introduces the task of visual named entity discovery in videos without the need for task-specific supervision or task-specific external knowledge sources. Assigning specific names to entities (e.g. faces, scenes, or objects) in video frames is a long-standing challenge. Commonly, this problem is addressed as a supervised learning objective by manually annotating faces with entity labels. To bypass the annotation burden of this setup, several works have investigated the problem by utilizing external knowledge sources such as movie databases. While effective, such approaches do not work when task-specific knowledge sources are not provided and can only be applied to movies and TV series. In this work, we take the problem a step further and propose to discover entities in videos from videos and corresponding captions or subtitles. We introduce a three-stage method where we (i)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

melika-ayoughi/self-contained-video-entity-discovery
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning