Search Multilayer Perceptron-Based Fusion for Efficient and Accurate Siamese Tracking
Tianqi Shen, Huakao Lin, Ning An

TL;DR
This paper introduces a novel MLP-based fusion module for Siamese trackers that improves pixel-level interaction efficiency, balancing accuracy and computational cost through neural architecture search, and achieves state-of-the-art results on multiple benchmarks.
Contribution
It proposes a simple MLP-based fusion module with a differentiable architecture search strategy to optimize channel width and depth, enhancing tracker efficiency and accuracy.
Findings
Achieves state-of-the-art accuracy-efficiency trade-offs.
Ranks among top on multiple tracking benchmarks.
Maintains real-time performance on resource-constrained hardware.
Abstract
Siamese visual trackers have recently advanced through increasingly sophisticated fusion mechanisms built on convolutional or Transformer architectures. However, both struggle to deliver pixel-level interactions efficiently on resource-constrained hardware, leading to a persistent accuracy-efficiency imbalance. Motivated by this limitation, we redesign the Siamese neck with a simple yet effective Multilayer Perception (MLP)-based fusion module that enables pixel-level interaction with minimal structural overhead. Nevertheless, naively stacking MLP blocks introduces a new challenge: computational cost can scale quadratically with channel width. To overcome this, we construct a hierarchical search space of carefully designed MLP modules and introduce a customized relaxation strategy that enables differentiable neural architecture search (DNAS) to decouple channel-width optimization from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Neural Network Applications · Face recognition and analysis
