LOVO: Efficient Complex Object Query in Large-Scale Video Datasets

Yuxin Liu; Yuezhang Peng; Hefeng Zhou; Hongze Liu; Xinyu Lu; Jiong Lou; Chentao Wu; Wei Zhao; Jie Li

arXiv:2507.14301·cs.IR·July 22, 2025

LOVO: Efficient Complex Object Query in Large-Scale Video Datasets

Yuxin Liu, Yuezhang Peng, Hefeng Zhou, Hongze Liu, Xinyu Lu, Jiong Lou, Chentao Wu, Wei Zhao, Jie Li

PDF

TL;DR

LOVO is a scalable system that enables efficient complex object queries in large-scale video datasets by using pre-trained visual encoders, compact embeddings, and an inverted multi-index structure to achieve low-latency search and high accuracy.

Contribution

LOVO introduces a novel, scalable approach combining one-time feature extraction, an inverted multi-index, and cross-modal reranking for efficient complex object querying in large video datasets.

Findings

01

Outperforms existing methods in query accuracy.

02

Achieves up to 85x lower search latency.

03

Reduces index construction costs significantly.

Abstract

The widespread deployment of cameras has led to an exponential increase in video data, creating vast opportunities for applications such as traffic management and crime surveillance. However, querying specific objects from large-scale video datasets presents challenges, including (1) processing massive and continuously growing data volumes, (2) supporting complex query requirements, and (3) ensuring low-latency execution. Existing video analysis methods struggle with either limited adaptability to unseen object classes or suffer from high query latency. In this paper, we present LOVO, a novel system designed to efficiently handle comp $\underline{L}$ ex $\underline{O}$ bject queries in large-scale $\underline{V}$ ide $\underline{O}$ datasets. Agnostic to user queries, LOVO performs one-time feature extraction using pre-trained visual encoders, generating compact visual embeddings for key…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.