Prompt-aware of Frame Sampling for Efficient Text-Video Retrieval

Deyu Zhang; Tingting Long; Jinrui Zhang; Ligeng Chen; Ju Ren; Yaoxue Zhang

arXiv:2507.15491·cs.MM·July 22, 2025

Prompt-aware of Frame Sampling for Efficient Text-Video Retrieval

Deyu Zhang, Tingting Long, Jinrui Zhang, Ligeng Chen, Ju Ren, Yaoxue Zhang

PDF

TL;DR

ProCLIP introduces a prompt-aware frame sampling method combined with a two-stage candidate pruning strategy to improve the efficiency and accuracy of text-video retrieval on edge devices.

Contribution

It proposes a novel prompt-aware frame sampling technique and a two-stage pruning approach, significantly enhancing retrieval efficiency without sacrificing accuracy.

Findings

01

75.3% latency reduction compared to baselines

02

Maintains competitive accuracy with R@1=49.0 on MSR-VTT

03

Effective balance of content coverage and computational cost

Abstract

Enabling efficient text-video retrieval on edge-end devices is critical for real-world applications. Yet, existing methods face a critical challenge in balancing accuracy and computational efficiency: uniform frame sampling methods ensure content coverage but incur prohibitive computational costs, while salient-frame sampling methods reduce overhead but suffer from query-agnostic frame selection that biases retrieval results. To address this, we propose ProCLIP, a user-centric framework that achieves state-of-the-art accuracy with significantly improved efficiency. We design a prompt-aware frame sampling strategy that dynamically guides lightweight feature extractors using textual prompts to select semantically relevant frames, overcoming the limitations of existing salient-frame sampling methods which rely on static, query-agnostic selection criteria. Moreover, we adopt a two-stage…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.