Enhanced Object Tracking by Self-Supervised Auxiliary Depth Estimation   Learning

Zhenyu Wei; Yujie He; Zhanchuan Cai

arXiv:2405.14195·cs.CV·May 24, 2024

Enhanced Object Tracking by Self-Supervised Auxiliary Depth Estimation Learning

Zhenyu Wei, Yujie He, Zhanchuan Cai

PDF

Open Access

TL;DR

This paper introduces MDETrack, a novel object tracking method that leverages self-supervised depth estimation to improve accuracy without requiring real depth data, maintaining real-time performance.

Contribution

Proposes a unified tracking and depth estimation framework that enhances RGB-based tracking accuracy using self-supervised depth learning, without increasing inference complexity.

Findings

01

Improved tracking accuracy without real depth inputs

02

Self-supervised depth estimation benefits object tracking

03

Maintains real-time inference speed

Abstract

RGB-D tracking significantly improves the accuracy of object tracking. However, its dependency on real depth inputs and the complexity involved in multi-modal fusion limit its applicability across various scenarios. The utilization of depth information in RGB-D tracking inspired us to propose a new method, named MDETrack, which trains a tracking network with an additional capability to understand the depth of scenes, through supervised or self-supervised auxiliary Monocular Depth Estimation learning. The outputs of MDETrack's unified feature extractor are fed to the side-by-side tracking head and auxiliary depth estimation head, respectively. The auxiliary module will be discarded in inference, thus keeping the same inference speed. We evaluated our models with various training strategies on multiple datasets, and the results show an improved tracking accuracy even without real depth.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Face and Expression Recognition · IoT-based Smart Home Systems