EQUI-VOCAL: Synthesizing Queries for Compositional Video Events from   Limited User Interactions [Technical Report]

Enhao Zhang; Maureen Daum; Dong He; Brandon Haynes; Ranjay Krishna,; Magdalena Balazinska

arXiv:2301.00929·cs.DB·August 9, 2023

EQUI-VOCAL: Synthesizing Queries for Compositional Video Events from Limited User Interactions [Technical Report]

Enhao Zhang, Maureen Daum, Dong He, Brandon Haynes, Ranjay Krishna,, Magdalena Balazinska

PDF

Open Access 1 Repo

TL;DR

EQUI-VOCAL is a system that enables users to efficiently generate complex video queries from minimal examples using active learning and spatio-temporal scene graphs, without requiring database expertise.

Contribution

It introduces a novel query synthesis approach leveraging scene graphs and active learning to handle large, noisy video data with minimal user input.

Findings

01

Outperforms baseline systems in F1 score

02

Faster synthesis times

03

Greater robustness to noisy data

Abstract

We introduce EQUI-VOCAL: a new system that automatically synthesizes queries over videos from limited user interactions. The user only provides a handful of positive and negative examples of what they are looking for. EQUI-VOCAL utilizes these initial examples and additional ones collected through active learning to efficiently synthesize complex user queries. Our approach enables users to find events without database expertise, with limited labeling effort, and without declarative specifications or sketches. Core to EQUI-VOCAL's design is the use of spatio-temporal scene graphs in its data model and query language and a novel query synthesis approach that works on large and noisy video data. Our system outperforms two baseline systems -- in terms of F1 score, synthesis time, and robustness to noise -- and can flexibly synthesize complex queries that the baselines do not support.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

uwdb/equi-vocal
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Human Pose and Action Recognition · Advanced Image and Video Retrieval Techniques