Multi-Modal Fusion for End-to-End RGB-T Tracking
Lichao Zhang, Martin Danelljan, Abel Gonzalez-Garcia, Joost van de, Weijer, Fahad Shahbaz Khan

TL;DR
This paper introduces an end-to-end RGB-T tracking framework that fuses RGB and thermal data at various levels, demonstrating improved tracking performance and achieving state-of-the-art results on benchmark datasets.
Contribution
The paper presents a novel end-to-end fusion approach for RGB-T tracking, analyzing multiple fusion mechanisms and training data synthesis for improved accuracy.
Findings
Fusion at feature-level yields best performance.
End-to-end training enhances modality integration.
Achieves state-of-the-art results on RGBT210 dataset.
Abstract
We propose an end-to-end tracking framework for fusing the RGB and TIR modalities in RGB-T tracking. Our baseline tracker is DiMP (Discriminative Model Prediction), which employs a carefully designed target prediction network trained end-to-end using a discriminative loss. We analyze the effectiveness of modality fusion in each of the main components in DiMP, i.e. feature extractor, target estimation network, and classifier. We consider several fusion mechanisms acting at different levels of the framework, including pixel-level, feature-level and response-level. Our tracker is trained in an end-to-end manner, enabling the components to learn how to fuse the information from both modalities. As data to train our model, we generate a large-scale RGB-T dataset by considering an annotated RGB tracking dataset (GOT-10k) and synthesizing paired TIR images using an image-to-image translation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Image Fusion Techniques · Image Enhancement Techniques
MethodsConvolution · 1x1 Convolution · Feature Pyramid Network · Region Proposal Network · Precise RoI Pooling · Dense Connections · IoU-Net
