ARKitTrack: A New Diverse Dataset for Tracking Using Mobile RGB-D Data
Haojie Zhao, Junsong Chen, Lijun Wang, Huchuan Lu

TL;DR
ARKitTrack introduces a comprehensive RGB-D tracking dataset captured with consumer-grade LiDAR on Apple devices, enabling advanced research in 3D tracking with a unified baseline demonstrating promising results.
Contribution
This paper presents ARKitTrack, a novel large-scale RGB-D dataset with diverse scenes and annotations, along with a unified baseline for box-level and pixel-level tracking.
Findings
The dataset significantly facilitates RGB-D tracking research.
The proposed baseline outperforms existing methods.
Empirical analysis confirms dataset's usefulness.
Abstract
Compared with traditional RGB-only visual tracking, few datasets have been constructed for RGB-D tracking. In this paper, we propose ARKitTrack, a new RGB-D tracking dataset for both static and dynamic scenes captured by consumer-grade LiDAR scanners equipped on Apple's iPhone and iPad. ARKitTrack contains 300 RGB-D sequences, 455 targets, and 229.7K video frames in total. Along with the bounding box annotations and frame-level attributes, we also annotate this dataset with 123.9K pixel-level target masks. Besides, the camera intrinsic and camera pose of each frame are provided for future developments. To demonstrate the potential usefulness of this dataset, we further present a unified baseline for both box-level and pixel-level tracking, which integrates RGB features with bird's-eye-view representations to better explore cross-modality 3D geometry. In-depth empirical analysis has…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Vision and Imaging · 3D Surveying and Cultural Heritage
