Self-Supervised RGB-T Tracking with Cross-Input Consistency

Xingchen Zhang; Yiannis Demiris

arXiv:2301.11274·cs.CV·January 27, 2023

Self-Supervised RGB-T Tracking with Cross-Input Consistency

Xingchen Zhang, Yiannis Demiris

PDF

Open Access

TL;DR

This paper introduces the first self-supervised RGB-T tracking method that learns from unlabeled video pairs, using cross-input consistency to achieve competitive performance without annotated data.

Contribution

It presents a novel self-supervised training strategy for RGB-T tracking based on cross-input consistency, eliminating the need for labeled RGB-T datasets.

Findings

01

Outperforms seven supervised RGB-T trackers on GTOT dataset

02

Effective training with only unlabeled RGB-T video pairs

03

Proves the viability of self-supervised learning in RGB-T tracking

Abstract

In this paper, we propose a self-supervised RGB-T tracking method. Different from existing deep RGB-T trackers that use a large number of annotated RGB-T image pairs for training, our RGB-T tracker is trained using unlabeled RGB-T video pairs in a self-supervised manner. We propose a novel cross-input consistency-based self-supervised training strategy based on the idea that tracking can be performed using different inputs. Specifically, we construct two distinct inputs using unlabeled RGB-T video pairs. We then track objects using these two inputs to generate results, based on which we construct our cross-input consistency loss. Meanwhile, we propose a reweighting strategy to make our loss function robust to low-quality training samples. We build our tracker on a Siamese correlation filter network. To the best of our knowledge, our tracker is the first self-supervised RGB-T tracker.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Advanced Image and Video Retrieval Techniques · Visual Attention and Saliency Detection