Tracking Objects as Pixel-wise Distributions
Zelin Zhao, Ze Wu, Yueqing Zhuang, Boxun Li, Jiaya Jia

TL;DR
This paper introduces P3AFormer, a novel transformer-based multi-object tracking method that models objects as pixel-wise distributions, achieving state-of-the-art accuracy on multiple benchmarks.
Contribution
It proposes tracking objects as pixel-wise distributions with a transformer architecture, enabling improved association and propagation of object features across frames.
Findings
Achieves 81.2% MOTA on MOT17, the first transformer network to surpass 80%.
Outperforms existing methods on MOT20 and KITTI benchmarks.
Introduces pixel-wise association for object connection recovery.
Abstract
Multi-object tracking (MOT) requires detecting and associating objects through frames. Unlike tracking via detected bounding boxes or tracking objects as points, we propose tracking objects as pixel-wise distributions. We instantiate this idea on a transformer-based architecture, P3AFormer, with pixel-wise propagation, prediction, and association. P3AFormer propagates pixel-wise features guided by flow information to pass messages between frames. Furthermore, P3AFormer adopts a meta-architecture to produce multi-scale object feature maps. During inference, a pixel-wise association procedure is proposed to recover object connections through frames based on the pixel-wise prediction. P3AFormer yields 81.2\% in terms of MOTA on the MOT17 benchmark -- the first among all transformer networks to reach 80\% MOTA in literature. P3AFormer also outperforms state-of-the-arts on the MOT20 and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Visual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques
