VPN: Video Provenance Network for Robust Content Attribution
Alexander Black, Tu Bui, Simon Jenni, Vishy Swaminathan, John, Collomosse

TL;DR
VPN is a robust video content attribution system that uses deep neural network embeddings to match transformed online videos to their original provenance, enabling reliable content tracking despite common modifications.
Contribution
It introduces a contrastive learning-based embedding method for video matching that is invariant to common online transformations, with an efficient indexing and retrieval system.
Findings
High accuracy recall over 100,000 videos
Effective matching despite video quality and content edits
Robust to various transformations and partial videos
Abstract
We present VPN - a content attribution method for recovering provenance information from videos shared online. Platforms, and users, often transform video into different quality, codecs, sizes, shapes, etc. or slightly edit its content such as adding text or emoji, as they are redistributed online. We learn a robust search embedding for matching such video, invariant to these transformations, using full-length or truncated video queries. Once matched against a trusted database of video clips, associated information on the provenance of the clip is presented to the user. We use an inverted index to match temporal chunks of video using late-fusion to combine both visual and audio features. In both cases, features are extracted via a deep neural network trained using contrastive learning on a dataset of original and augmented video clips. We demonstrate high accuracy recall over a corpus…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsContrastive Learning
