WebUOT-1M: Advancing Deep Underwater Object Tracking with A Million-Scale Benchmark
Chunhui Zhang, Li Liu, Guanjie Huang, Hao Wen, Xi Zhou, Yanfeng Wang

TL;DR
This paper introduces WebUOT-1M, the largest underwater object tracking benchmark with 1.1 million frames, and proposes a novel knowledge distillation framework to improve tracking performance in underwater environments.
Contribution
The paper presents WebUOT-1M, a large-scale underwater tracking dataset, and a new knowledge distillation method to transfer open-air tracking knowledge to underwater models.
Findings
WebUOT-1M surpasses previous datasets in scale and diversity.
The proposed distillation framework improves underwater tracking accuracy.
Evaluation on 30 trackers demonstrates WebUOT-1M's effectiveness as a benchmark.
Abstract
Underwater object tracking (UOT) is a foundational task for identifying and tracing submerged entities in underwater video sequences. However, current UOT datasets suffer from limitations in scale, diversity of target categories and scenarios covered, hindering the training and evaluation of modern tracking algorithms. To bridge this gap, we take the first step and introduce WebUOT-1M, \ie, the largest public UOT benchmark to date, sourced from complex and realistic underwater environments. It comprises 1.1 million frames across 1,500 video clips filtered from 408 target categories, largely surpassing previous UOT datasets, \eg, UVOT400. Through meticulous manual annotation and verification, we provide high-quality bounding boxes for underwater targets. Additionally, WebUOT-1M includes language prompts for video sequences, expanding its application areas, \eg, underwater vision-language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsUnderwater Acoustics Research · Target Tracking and Data Fusion in Sensor Networks · Underwater Vehicles and Communication Systems
MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Multi-Head Attention · Dropout · Dense Connections
