Rekall: Specifying Video Events using Compositions of Spatiotemporal Labels
Daniel Y. Fu, Will Crichton, James Hong, Xinwei Yao, Haotian Zhang,, Anh Truong, Avanika Narayan, Maneesh Agrawala, Christopher R\'e, Kayvon, Fatahalian

TL;DR
Rekall is a library that allows users to specify and detect new video events by composing outputs of pre-trained models through a query language, enabling rapid and accurate event detection without training new models.
Contribution
The paper introduces Rekall, a novel library for compositional video event specification using existing models, facilitating quick and accurate detection of domain-specific events.
Findings
Domain experts can create effective queries in hours to detect new events.
Rekall achieves detection accuracy comparable or superior to learned models.
Novice users can learn to author queries within an hour.
Abstract
Many real-world video analysis applications require the ability to identify domain-specific events in video, such as interviews and commercials in TV news broadcasts, or action sequences in film. Unfortunately, pre-trained models to detect all the events of interest in video may not exist, and training new models from scratch can be costly and labor-intensive. In this paper, we explore the utility of specifying new events in video in a more traditional manner: by writing queries that compose outputs of existing, pre-trained models. To write these queries, we have developed Rekall, a library that exposes a data model and programming model for compositional video event specification. Rekall represents video annotations from different sources (object detectors, transcripts, etc.) as spatiotemporal labels associated with continuous volumes of spacetime in a video, and provides operators for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Advanced Image and Video Retrieval Techniques · Human Motion and Animation
