LazyVLM: Neuro-Symbolic Approach to Video Analytics
Xiangru Jian, Wei Pang, Zhengyuan Dong, Chao Zhang, M. Tamer \"Ozsu

TL;DR
LazyVLM is a neuro-symbolic video analytics system that combines user-friendly querying with scalable processing by decomposing complex video queries into efficient operations, addressing the limitations of existing VLMs.
Contribution
LazyVLM introduces a scalable neuro-symbolic framework that enables complex multi-frame video queries through a semi-structured interface, improving efficiency and flexibility.
Findings
Provides robust and efficient video querying at scale.
Decomposes complex queries into fine-grained operations.
Outperforms traditional VLMs in scalability and flexibility.
Abstract
Current video analytics approaches face a fundamental trade-off between flexibility and efficiency. End-to-end Vision Language Models (VLMs) often struggle with long-context processing and incur high computational costs, while neural-symbolic methods depend heavily on manual labeling and rigid rule design. In this paper, we introduce LazyVLM, a neuro-symbolic video analytics system that provides a user-friendly query interface similar to VLMs, while addressing their scalability limitation. LazyVLM enables users to effortlessly drop in video data and specify complex multi-frame video queries using a semi-structured text interface for video analytics. To address the scalability limitations of VLMs, LazyVLM decomposes multi-frame video queries into fine-grained operations and offloads the bulk of the processing to efficient relational query execution and vector similarity search. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition
