Learning Partially-Decorrelated Common Spaces for Ad-hoc Video Search

Fan Hu; Zijie Xin; Xirong Li

arXiv:2508.02340·cs.CV·August 5, 2025

Learning Partially-Decorrelated Common Spaces for Ad-hoc Video Search

Fan Hu, Zijie Xin, Xirong Li

PDF

TL;DR

This paper introduces LPD, a novel method for ad-hoc video search that learns multiple partially decorrelated feature spaces to better capture visual diversity and improve retrieval accuracy.

Contribution

LPD proposes feature-specific common spaces with de-correlation loss and a fair multi-space triplet loss, advancing the diversity and effectiveness of video search models.

Findings

01

LPD outperforms existing methods on TRECVID AVS benchmarks.

02

LPD enhances diversity in search results.

03

Visualizations show increased feature space separation.

Abstract

Ad-hoc Video Search (AVS) involves using a textual query to search for multiple relevant videos in a large collection of unlabeled short videos. The main challenge of AVS is the visual diversity of relevant videos. A simple query such as "Find shots of a man and a woman dancing together indoors" can span a multitude of environments, from brightly lit halls and shadowy bars to dance scenes in black-and-white animations. It is therefore essential to retrieve relevant videos as comprehensively as possible. Current solutions for the AVS task primarily fuse multiple features into one or more common spaces, yet overlook the need for diverse spaces. To fully exploit the expressive capability of individual features, we propose LPD, short for Learning Partially Decorrelated common spaces. LPD incorporates two key innovations: feature-specific common space construction and the de-correlation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.