TAPFormer: Robust Arbitrary Point Tracking via Transient Asynchronous Fusion of Frames and Events
Jiaxiong Liu, Zhen Tan, Jinpu Zhang, Yi Zhou, Hui Shen, Xieyuanli Chen, Dewen Hu

TL;DR
TAPFormer introduces a transformer-based framework that asynchronously fuses frames and event data for robust, high-frequency point tracking, effectively handling temporal misalignment and modality failures in challenging conditions.
Contribution
The paper presents TAPFormer, a novel asynchronous fusion mechanism and adaptive attention module for improved point tracking using combined frame and event data.
Findings
Outperforms existing trackers with 28.2% lower pixel error
Achieves state-of-the-art results on standard benchmarks
Demonstrates robustness under challenging illumination and motion conditions
Abstract
Tracking any point (TAP) is a fundamental yet challenging task in computer vision, requiring high precision and long-term motion reasoning. Recent attempts to combine RGB frames and event streams have shown promise, yet they typically rely on synchronous or non-adaptive fusion, leading to temporal misalignment and severe degradation when one modality fails. We introduce TAPFormer, a transformer-based framework that performs asynchronous temporal-consistent fusion of frames and events for robust and high-frequency arbitrary point tracking. Our key innovation is a Transient Asynchronous Fusion (TAF) mechanism, which explicitly models the temporal evolution between discrete frames through continuous event updates, bridging the gap between low-rate frames and high-rate events. In addition, a Cross-modal Locally Weighted Fusion (CLWF) module adaptively adjusts spatial attention according to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · Advanced Optical Sensing Technologies
