LightFC-X: Lightweight Convolutional Tracker for RGB-X Tracking
Yunfeng Li, Bo Wang, Ye Li

TL;DR
LightFC-X introduces a lightweight, efficient multimodal tracking framework that balances high performance with low computational cost, suitable for resource-limited devices, through novel modules ECAM and STAM.
Contribution
The paper presents LightFC-X, a novel lightweight convolutional RGB-X tracker with new modules ECAM and STAM for efficient cross-modal interaction and temporal feature aggregation.
Findings
Achieves state-of-the-art performance with fewer parameters.
Runs in real-time at 22 fps on CPU.
Outperforms previous methods like CMD on LasHeR benchmark.
Abstract
Despite great progress in multimodal tracking, these trackers remain too heavy and expensive for resource-constrained devices. To alleviate this problem, we propose LightFC-X, a family of lightweight convolutional RGB-X trackers that explores a unified convolutional architecture for lightweight multimodal tracking. Our core idea is to achieve lightweight cross-modal modeling and joint refinement of the multimodal features and the spatiotemporal appearance features of the target. Specifically, we propose a novel efficient cross-attention module (ECAM) and a novel spatiotemporal template aggregation module (STAM). The ECAM achieves lightweight cross-modal interaction of template-search area integrated feature with only 0.08M parameters. The STAM enhances the model's utilization of temporal information through module fine-tuning paradigm. Comprehensive experiments show that our LightFC-X…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIndustrial Vision Systems and Defect Detection · CCD and CMOS Imaging Sensors · Infrared Target Detection Methodologies
MethodsSoftmax · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Concatenated Skip Connection
