M$^5$L: Multi-Modal Multi-Margin Metric Learning for RGBT Tracking
Zhengzheng Tu, Chun Lin, Chenglong Li, Jin Tang, Bin Luo

TL;DR
This paper introduces M$^5$L, a novel multi-modal metric learning framework for RGBT tracking that enhances sample distinction and modality fusion, leading to improved tracking accuracy.
Contribution
The paper proposes a multi-margin structured loss and cross-modality constraints to better distinguish confusing samples and fuse RGB and thermal features effectively.
Findings
Outperforms state-of-the-art RGBT trackers on large-scale datasets.
Improves tracking accuracy by enlarging margins between confusing and normal samples.
Enhances feature fusion with modality attention mechanisms.
Abstract
Classifying the confusing samples in the course of RGBT tracking is a quite challenging problem, which hasn't got satisfied solution. Existing methods only focus on enlarging the boundary between positive and negative samples, however, the structured information of samples might be harmed, e.g., confusing positive samples are closer to the anchor than normal positive samples.To handle this problem, we propose a novel Multi-Modal Multi-Margin Metric Learning framework, named ML for RGBT tracking in this paper. In particular, we design a multi-margin structured loss to distinguish the confusing samples which play a most critical role in tracking performance boosting. To alleviate this problem, we additionally enlarge the boundaries between confusing positive samples and normal ones, between confusing negative samples and normal ones with predefined margins, by exploiting the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Infrared Thermography in Medicine · Human Pose and Action Recognition
