LightFC-X: Lightweight Convolutional Tracker for RGB-X Tracking

Yunfeng Li; Bo Wang; Ye Li

arXiv:2502.18143·cs.CV·February 26, 2025

LightFC-X: Lightweight Convolutional Tracker for RGB-X Tracking

Yunfeng Li, Bo Wang, Ye Li

PDF

Open Access 1 Repo

TL;DR

LightFC-X introduces a lightweight, efficient multimodal tracking framework that balances high performance with low computational cost, suitable for resource-limited devices, through novel modules ECAM and STAM.

Contribution

The paper presents LightFC-X, a novel lightweight convolutional RGB-X tracker with new modules ECAM and STAM for efficient cross-modal interaction and temporal feature aggregation.

Findings

01

Achieves state-of-the-art performance with fewer parameters.

02

Runs in real-time at 22 fps on CPU.

03

Outperforms previous methods like CMD on LasHeR benchmark.

Abstract

Despite great progress in multimodal tracking, these trackers remain too heavy and expensive for resource-constrained devices. To alleviate this problem, we propose LightFC-X, a family of lightweight convolutional RGB-X trackers that explores a unified convolutional architecture for lightweight multimodal tracking. Our core idea is to achieve lightweight cross-modal modeling and joint refinement of the multimodal features and the spatiotemporal appearance features of the target. Specifically, we propose a novel efficient cross-attention module (ECAM) and a novel spatiotemporal template aggregation module (STAM). The ECAM achieves lightweight cross-modal interaction of template-search area integrated feature with only 0.08M parameters. The STAM enhances the model's utilization of temporal information through module fine-tuning paradigm. Comprehensive experiments show that our LightFC-X…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

liyunfenglyf/lightfc-x
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIndustrial Vision Systems and Defect Detection · CCD and CMOS Imaging Sensors · Infrared Target Detection Methodologies

MethodsSoftmax · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Concatenated Skip Connection