LazyVLM: Neuro-Symbolic Approach to Video Analytics

Xiangru Jian; Wei Pang; Zhengyuan Dong; Chao Zhang; M. Tamer \"Ozsu

arXiv:2505.21459·cs.DB·May 28, 2025

LazyVLM: Neuro-Symbolic Approach to Video Analytics

Xiangru Jian, Wei Pang, Zhengyuan Dong, Chao Zhang, M. Tamer \"Ozsu

PDF

Open Access

TL;DR

LazyVLM is a neuro-symbolic video analytics system that combines user-friendly querying with scalable processing by decomposing complex video queries into efficient operations, addressing the limitations of existing VLMs.

Contribution

LazyVLM introduces a scalable neuro-symbolic framework that enables complex multi-frame video queries through a semi-structured interface, improving efficiency and flexibility.

Findings

01

Provides robust and efficient video querying at scale.

02

Decomposes complex queries into fine-grained operations.

03

Outperforms traditional VLMs in scalability and flexibility.

Abstract

Current video analytics approaches face a fundamental trade-off between flexibility and efficiency. End-to-end Vision Language Models (VLMs) often struggle with long-context processing and incur high computational costs, while neural-symbolic methods depend heavily on manual labeling and rigid rule design. In this paper, we introduce LazyVLM, a neuro-symbolic video analytics system that provides a user-friendly query interface similar to VLMs, while addressing their scalability limitation. LazyVLM enables users to effortlessly drop in video data and specify complex multi-frame video queries using a semi-structured text interface for video analytics. To address the scalability limitations of VLMs, LazyVLM decomposes multi-frame video queries into fine-grained operations and offloads the bulk of the processing to efficient relational query execution and vector similarity search. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition