A Dual-Stream Transformer Architecture for Illumination-Invariant TIR-LiDAR Person Tracking

Yuki Minase; Kanji Tanaka

arXiv:2604.00363·cs.RO·April 2, 2026

A Dual-Stream Transformer Architecture for Illumination-Invariant TIR-LiDAR Person Tracking

Yuki Minase, Kanji Tanaka

PDF

TL;DR

This paper introduces a novel TIR-D tracking architecture for robust person tracking in diverse lighting conditions, leveraging a knowledge transfer strategy and differential learning rates to enhance performance.

Contribution

It presents a new dual-stream transformer model for TIR-D tracking, including a knowledge transfer method and differential learning rate strategy for improved robustness.

Findings

01

Achieved an AO of 0.700 and SR of 58.7% on TIR-D tracking benchmarks.

02

Outperformed conventional RGB-transfer and single-modality baselines.

03

Demonstrated effective adaptation to geometric depth cues in challenging environments.

Abstract

Robust person tracking is a critical capability for autonomous mobile robots operating in diverse and unpredictable environments. While RGB-D tracking has shown high precision, its performance severely degrades under challenging illumination conditions, such as total darkness or intense backlighting. To achieve all-weather robustness, this paper proposes a novel Thermal-Infrared and Depth (TIR-D) tracking architecture that leverages the standard sensor suite of SLAM-capable robots, namely LiDAR and TIR cameras. A major challenge in TIR-D tracking is the scarcity of annotated multi-modal datasets. To address this, we introduce a sequential knowledge transfer strategy that evolves structural priors from a large-scale thermal-trained model into the TIR-D domain. By employing a differential learning rate strategy -- referred to as ``Fine-grained Differential Learning Rate Strategy'' -- we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.