PRVQL: Progressive Knowledge-guided Refinement for Robust Egocentric Visual Query Localization

Bing Fan; Yunhe Feng; Yapeng Tian; James Chenhao Liang; Yuewei Lin; Yan Huang; Heng Fan

arXiv:2502.07707·cs.CV·July 2, 2025

PRVQL: Progressive Knowledge-guided Refinement for Robust Egocentric Visual Query Localization

Bing Fan, Yunhe Feng, Yapeng Tian, James Chenhao Liang, Yuewei Lin, Yan Huang, Heng Fan

PDF

Open Access 1 Repo

TL;DR

PRVQL introduces a progressive, knowledge-guided refinement framework for egocentric visual query localization, effectively handling appearance changes and clutter by iteratively improving target features and localization accuracy.

Contribution

It proposes a novel multi-stage framework that exploits target-relevant knowledge from videos to progressively refine features for robust localization in egocentric videos.

Findings

01

Achieves state-of-the-art results on Ego4D dataset.

02

Significantly outperforms previous methods in complex scenes.

03

Demonstrates effective knowledge-guided feature refinement.

Abstract

Egocentric visual query localization (EgoVQL) focuses on localizing the target of interest in space and time from first-person videos, given a visual query. Despite recent progressive, existing methods often struggle to handle severe object appearance changes and cluttering background in the video due to lacking sufficient target cues, leading to degradation. Addressing this, we introduce PRVQL, a novel Progressive knowledge-guided Refinement framework for EgoVQL. The core is to continuously exploit target-relevant knowledge directly from videos and utilize it as guidance to refine both query and video features for improving target localization. Our PRVQL contains multiple processing stages. The target knowledge from one stage, comprising appearance and spatial knowledge extracted via two specially designed knowledge learning modules, are utilized as guidance to refine the query and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fb-reps/prvql
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Image Retrieval and Classification Techniques