Automated Video Labelling: Identifying Faces by Corroborative Evidence
Andrew Brown, Ernesto Coto, Andrew Zisserman

TL;DR
This paper introduces an automated face labeling system for videos that leverages corroborative evidence from visual and audio sources, enabling scalable and high-precision indexing of large video archives without manual supervision.
Contribution
The paper presents a novel method for determining if a person is famous using image-search engines, facilitating reliable face-identity modeling for automatic labeling.
Findings
Works across multiple video domains without domain adaptation.
Achieves state-of-the-art results on public benchmarks.
Effectively labels faces of both famous and less-famous individuals.
Abstract
We present a method for automatically labelling all faces in video archives, such as TV broadcasts, by combining multiple evidence sources and multiple modalities (visual and audio). We target the problem of ever-growing online video archives, where an effective, scalable indexing solution cannot require a user to provide manual annotation or supervision. To this end, we make three key contributions: (1) We provide a novel, simple, method for determining if a person is famous or not using image-search engines. In turn this enables a face-identity model to be built reliably and robustly, and used for high precision automatic labelling; (2) We show that even for less-famous people, image-search engines can then be used for corroborative evidence to accurately label faces that are named in the scene or the speech; (3) Finally, we quantitatively demonstrate the benefits of our approach on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
