VQ-Jarvis: Retrieval-Augmented Video Restoration Agent with Sharp Vision and Fast Thought
Xuanyu Zhang, Weiqi Li, Qunliang Xing, Jingfen Xie, Bin Chen, Junlin Li, Li Zhang, Jian Zhang, Shijie Zhao

TL;DR
VQ-Jarvis is an advanced video restoration agent that combines retrieval-augmented decision making with sharp perception and efficient search strategies, significantly improving restoration quality in complex scenarios.
Contribution
The paper introduces VQ-Jarvis, a novel retrieval-augmented video restoration agent with a large-scale dataset, perception models, and hierarchical search for improved performance.
Findings
Outperforms existing methods on complex degraded videos
Achieves faster and more accurate restoration trajectories
Introduces the VSR-Compare dataset with 20K pairs
Abstract
Video restoration in real-world scenarios is challenged by heterogeneous degradations, where static architectures and fixed inference pipelines often fail to generalize. Recent agent-based approaches offer dynamic decision making, yet existing video restoration agents remain limited by insufficient quality perception and inefficient search strategies. We propose VQ-Jarvis, a retrieval-augmented, all-in-one intelligent video restoration agent with sharper vision and faster thought. VQ-Jarvis is designed to accurately perceive degradations and subtle differences among paired restoration results, while efficiently discovering optimal restoration trajectories. To enable sharp vision, we construct VSR-Compare, the first large-scale video paired enhancement dataset with 20K comparison pairs covering 7 degradation types, 11 enhancement operators, and diverse content domains. Based on this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Video Quality Assessment · Image Enhancement Techniques · Visual Attention and Saliency Detection
