RGB-Sonar Tracking Benchmark and Spatial Cross-Attention Transformer Tracker
Yunfeng Li, Bo Wang, Jiuran Sun, Xueyi Wu, Ye Li

TL;DR
This paper introduces a new RGB-Sonar tracking benchmark and a novel spatial cross-attention transformer tracker, SCANet, which effectively fuses RGB and sonar data for underwater target tracking, achieving state-of-the-art results.
Contribution
The paper presents a new RGB-Sonar tracking dataset, a novel spatial cross-attention module, and a training method to improve underwater target tracking performance.
Findings
RGBS50 benchmark is challenging for existing trackers.
SCANet outperforms previous methods on the benchmark.
Spatial cross-attention effectively fuses RGB and sonar modalities.
Abstract
Vision camera and sonar are naturally complementary in the underwater environment. Combining the information from two modalities will promote better observation of underwater targets. However, this problem has not received sufficient attention in previous research. Therefore, this paper introduces a new challenging RGB-Sonar (RGB-S) tracking task and investigates how to achieve efficient tracking of an underwater target through the interaction of RGB and sonar modalities. Specifically, we first propose an RGBS50 benchmark dataset containing 50 sequences and more than 87000 high-quality annotated bounding boxes. Experimental results show that the RGBS50 benchmark poses a challenge to currently popular SOT trackers. Second, we propose an RGB-S tracker called SCANet, which includes a spatial cross-attention module (SCAM) consisting of a novel spatial cross-attention layer and two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInfrared Target Detection Methodologies · Visual Attention and Saliency Detection · CCD and CMOS Imaging Sensors
MethodsSoftmax · Concatenated Skip Connection
