Self-Enhancing Video Data Management System for Compositional Events with Large Language Models [Technical Report]
Enhao Zhang, Nicole Sullivan, Brandon Haynes, Ranjay Krishna,, Magdalena Balazinska

TL;DR
This paper presents VOCAL-UDF, a self-enhancing video data management system that uses large language models to automatically generate and select user-defined functions for complex, compositional video queries, improving performance without predefined modules.
Contribution
Introducing VOCAL-UDF, a novel system that automatically constructs missing modules for video queries using LLMs and active learning, expanding query capabilities without predefined modules.
Findings
Significantly improves query performance across multiple datasets.
Supports both program-based and distilled-model UDFs.
Effectively handles ambiguous user intent with candidate generation and active learning.
Abstract
Complex video queries can be answered by decomposing them into modular subtasks. However, existing video data management systems assume the existence of predefined modules for each subtask. We introduce VOCAL-UDF, a novel self-enhancing system that supports compositional queries over videos without the need for predefined modules. VOCAL-UDF automatically identifies and constructs missing modules and encapsulates them as user-defined functions (UDFs), thus expanding its querying capabilities. To achieve this, we formulate a unified UDF model that leverages large language models (LLMs) to aid in new UDF generation. VOCAL-UDF handles a wide range of concepts by supporting both program-based UDFs (i.e., Python functions generated by LLMs) and distilled-model UDFs (lightweight vision models distilled from strong pretrained models). To resolve the inherent ambiguity in user intent, VOCAL-UDF…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Natural Language Processing Techniques · Video Analysis and Summarization
