Enhanced Object Tracking by Self-Supervised Auxiliary Depth Estimation Learning
Zhenyu Wei, Yujie He, Zhanchuan Cai

TL;DR
This paper introduces MDETrack, a novel object tracking method that leverages self-supervised depth estimation to improve accuracy without requiring real depth data, maintaining real-time performance.
Contribution
Proposes a unified tracking and depth estimation framework that enhances RGB-based tracking accuracy using self-supervised depth learning, without increasing inference complexity.
Findings
Improved tracking accuracy without real depth inputs
Self-supervised depth estimation benefits object tracking
Maintains real-time inference speed
Abstract
RGB-D tracking significantly improves the accuracy of object tracking. However, its dependency on real depth inputs and the complexity involved in multi-modal fusion limit its applicability across various scenarios. The utilization of depth information in RGB-D tracking inspired us to propose a new method, named MDETrack, which trains a tracking network with an additional capability to understand the depth of scenes, through supervised or self-supervised auxiliary Monocular Depth Estimation learning. The outputs of MDETrack's unified feature extractor are fed to the side-by-side tracking head and auxiliary depth estimation head, respectively. The auxiliary module will be discarded in inference, thus keeping the same inference speed. We evaluated our models with various training strategies on multiple datasets, and the results show an improved tracking accuracy even without real depth.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Face and Expression Recognition · IoT-based Smart Home Systems
