MixFormer: End-to-End Tracking with Iterative Mixed Attention
Yutao Cui, Cheng Jiang, Limin Wang, Gangshan Wu

TL;DR
MixFormer is a transformer-based tracking framework that unifies feature extraction and target information integration using a novel mixed attention module, achieving state-of-the-art results across multiple benchmarks.
Contribution
The paper introduces MixFormer, a new transformer-based tracking architecture with a mixed attention module for simultaneous feature extraction and target integration, simplifying the tracking pipeline.
Findings
Achieves state-of-the-art performance on five benchmarks.
Introduces an asymmetric attention scheme for multiple templates.
Demonstrates effectiveness of synchronous feature extraction and integration.
Abstract
Tracking often uses a multi-stage pipeline of feature extraction, target information integration, and bounding box estimation. To simplify this pipeline and unify the process of feature extraction and target information integration, we present a compact tracking framework, termed as MixFormer, built upon transformers. Our core design is to utilize the flexibility of attention operations, and propose a Mixed Attention Module (MAM) for simultaneous feature extraction and target information integration. This synchronous modeling scheme allows to extract target-specific discriminative features and perform extensive communication between target and search area. Based on MAM, we build our MixFormer tracking framework simply by stacking multiple MAMs with progressive patch embedding and placing a localization head on top. In addition, to handle multiple target templates during online tracking,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Air Quality Monitoring and Forecasting · Impact of Light on Environment and Health
