AR2-4FV: Anchored Referring and Re-identification for Long-Term Grounding in Fixed-View Videos
Teng Yan, Yihan Liu, Jiongxu Chen, Teng Wang, Jiaqi Li, Bingzhuo Zhong

TL;DR
This paper introduces AR2-4FV, a novel framework for long-term referring in fixed-view videos that leverages background stability and an anchor bank to improve re-identification and re-capture of objects over time.
Contribution
AR2-4FV presents a new approach combining background-based anchoring, re-entry priors, and a lightweight ReID-Gating mechanism for robust long-term object grounding.
Findings
+10.3% Re-Capture Rate (RCR) over baseline
-24.2% Re-Capture Latency (RCL) reduction
Effective use of background stability and anchor-based methods
Abstract
Long-term language-guided referring in fixed-view videos is challenging: the referent may be occluded or leave the scene for long intervals and later re-enter, while framewise referring pipelines drift as re-identification (ReID) becomes unreliable. AR2-4FV leverages background stability for long-term referring. An offline Anchor Bank is distilled from static background structures; at inference, the text query is aligned with this bank to produce an Anchor Map that serves as persistent semantic memory when the referent is absent. An anchor-based re-entry prior accelerates re-capture upon return, and a lightweight ReID-Gating mechanism maintains identity continuity using displacement cues in the anchor frame. The system predicts per-frame bounding boxes without assuming the target is visible in the first frame or explicitly modeling appearance variations. AR2-4FV achieves +10.3%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Multimodal Machine Learning Applications · Human Pose and Action Recognition
