TL;DR
SEATrack is a novel multimodal tracker that enhances cross-modal alignment and global relation modeling, achieving a better balance of performance and efficiency across various tracking tasks.
Contribution
It introduces AMG-LoRA for dynamic attention alignment and HMoE for efficient global relation modeling, advancing multimodal tracking performance and efficiency.
Findings
Outperforms state-of-the-art methods in RGB-T, RGB-D, and RGB-E tracking.
Achieves a better balance of accuracy and computational efficiency.
Demonstrates the effectiveness of AMG-LoRA and HMoE modules.
Abstract
Parameter-efficient fine-tuning (PEFT) in multimodal tracking reveals a concerning trend where recent performance gains are often achieved at the cost of inflated parameter budgets, which fundamentally erodes PEFT's efficiency promise. In this work, we introduce SEATrack, a Simple, Efficient, and Adaptive two-stream multimodal tracker that tackles this performance-efficiency dilemma from two complementary perspectives. We first prioritize cross-modal alignment of matching responses, an underexplored yet pivotal factor that we argue is essential for breaking the trade-off. Specifically, we observe that modality-specific biases in existing two-stream methods generate conflicting matching attention maps, thereby hindering effective joint representation learning. To mitigate this, we propose AMG-LoRA, which seamlessly integrates Low-Rank Adaptation (LoRA) for domain adaptation with Adaptive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
