ReCoVR: Closing the Loop in Interactive Composed Video Retrieval
Bingqing Zhang, Yi Zhang, Zhuo Cao, Yang Li, Xue Li, Jiajun Liu, Sen Wang

TL;DR
ReCoVR introduces a multi-turn, interactive video retrieval system that uses a dual-pathway architecture to incorporate user feedback and self-reflection, significantly improving retrieval accuracy.
Contribution
It formalizes multi-turn interactive composed video retrieval and proposes ReCoVR, a novel reflexive architecture that enhances retrieval through diagnostic feedback and trajectory monitoring.
Findings
ReCoVR achieves 74.30% R@1 after one round on WebVid-CoVR-Test.
The dual-pathway design improves retrieval accuracy over existing methods.
ReCoVR effectively monitors and corrects retrieval trajectories across turns.
Abstract
Composed video retrieval (CoVR) searches for target videos using a reference video and a modification text, but existing methods are restricted to a single interaction round and cannot support the progressive nature of real-world visual search. To bridge this gap, we first formalize interactive composed video retrieval, a multi-turn extension of CoVR, where users progressively refine their search intent through natural-language feedback across turns. Adapting existing interactive retrieval methods to this setting reveals two structural weaknesses: reliance on a single retrieval channel and an open-loop retrieval design that consumes user feedback but does not diagnose whether its own retrieval trajectory is drifting or stagnating. To address these limitations, we propose ReCoVR (Reflexive Composed Video Retrieval), a dual-pathway architecture built on reflexive perception, where the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
