A Dual-Stream Transformer Architecture for Illumination-Invariant TIR-LiDAR Person Tracking
Yuki Minase, Kanji Tanaka

TL;DR
This paper introduces a novel TIR-D tracking architecture for robust person tracking in diverse lighting conditions, leveraging a knowledge transfer strategy and differential learning rates to enhance performance.
Contribution
It presents a new dual-stream transformer model for TIR-D tracking, including a knowledge transfer method and differential learning rate strategy for improved robustness.
Findings
Achieved an AO of 0.700 and SR of 58.7% on TIR-D tracking benchmarks.
Outperformed conventional RGB-transfer and single-modality baselines.
Demonstrated effective adaptation to geometric depth cues in challenging environments.
Abstract
Robust person tracking is a critical capability for autonomous mobile robots operating in diverse and unpredictable environments. While RGB-D tracking has shown high precision, its performance severely degrades under challenging illumination conditions, such as total darkness or intense backlighting. To achieve all-weather robustness, this paper proposes a novel Thermal-Infrared and Depth (TIR-D) tracking architecture that leverages the standard sensor suite of SLAM-capable robots, namely LiDAR and TIR cameras. A major challenge in TIR-D tracking is the scarcity of annotated multi-modal datasets. To address this, we introduce a sequential knowledge transfer strategy that evolves structural priors from a large-scale thermal-trained model into the TIR-D domain. By employing a differential learning rate strategy -- referred to as ``Fine-grained Differential Learning Rate Strategy'' -- we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
