Self-Enhancing Video Data Management System for Compositional Events   with Large Language Models [Technical Report]

Enhao Zhang; Nicole Sullivan; Brandon Haynes; Ranjay Krishna,; Magdalena Balazinska

arXiv:2408.02243·cs.DB·February 20, 2025

Self-Enhancing Video Data Management System for Compositional Events with Large Language Models [Technical Report]

Enhao Zhang, Nicole Sullivan, Brandon Haynes, Ranjay Krishna,, Magdalena Balazinska

PDF

Open Access 1 Repo

TL;DR

This paper presents VOCAL-UDF, a self-enhancing video data management system that uses large language models to automatically generate and select user-defined functions for complex, compositional video queries, improving performance without predefined modules.

Contribution

Introducing VOCAL-UDF, a novel system that automatically constructs missing modules for video queries using LLMs and active learning, expanding query capabilities without predefined modules.

Findings

01

Significantly improves query performance across multiple datasets.

02

Supports both program-based and distilled-model UDFs.

03

Effectively handles ambiguous user intent with candidate generation and active learning.

Abstract

Complex video queries can be answered by decomposing them into modular subtasks. However, existing video data management systems assume the existence of predefined modules for each subtask. We introduce VOCAL-UDF, a novel self-enhancing system that supports compositional queries over videos without the need for predefined modules. VOCAL-UDF automatically identifies and constructs missing modules and encapsulates them as user-defined functions (UDFs), thus expanding its querying capabilities. To achieve this, we formulate a unified UDF model that leverages large language models (LLMs) to aid in new UDF generation. VOCAL-UDF handles a wide range of concepts by supporting both program-based UDFs (i.e., Python functions generated by LLMs) and distilled-model UDFs (lightweight vision models distilled from strong pretrained models). To resolve the inherent ambiguity in user intent, VOCAL-UDF…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

uwdb/VOCAL-UDF
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Natural Language Processing Techniques · Video Analysis and Summarization