QueST: Persistent Queries as Semantic Monitors for Drift Suppression in Long-Horizon Tracking
Mayank Anand,Mohammad Saqlain,Kyan Mahajan,Priya Shukla, Gora Chand Nandi, Andrew Melnik

TL;DR
QueST introduces a semantic monitoring framework for long-horizon video tracking, reducing drift and improving identity preservation by treating entities as persistent queries with physical grounding.
Contribution
It proposes a novel monitoring-by-design approach that uses persistent semantic queries and geometric constraints to enhance long-term tracking accuracy.
Findings
QueST reduces terminal drift by 67.7% APE over TAP-Net.
It outperforms RAFT-3D, CoTracker, and TAP-Net on long-horizon sequences.
Embedding semantic monitoring improves tracking under distribution shift.
Abstract
Tracking points in videos is typically formulated as frame-to-frame correspondence, where each point is matched locally to the next frame. While this works over short horizons, errors accumulate under articulation, occlusion, and viewpoint change, leading to silent semantic drift that existing trackers cannot detect or correct. In this work, we revisit long-horizon tracking from a monitoring perspective and introduce QueST, a monitoring-by-design framework that treats interaction-relevant entities as persistent semantic queries rather than transient point tracks. Instead of local propagation, each query attends globally over spatio-temporal video features at every time-step, providing a stable semantic anchor across time. We further constrain query trajectories with lightweight 3D physical grounding, using geometric plausibility to suppress unbounded drift under occlusion. We evaluate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
