Attention-Based Depth Distillation with 3D-Aware Positional Encoding for   Monocular 3D Object Detection

Zizhang Wu; Yunzhe Wu; Jian Pu; Xianzhi Li; Xiaoquan Wang

arXiv:2211.16779·cs.CV·July 4, 2023·1 cites

Attention-Based Depth Distillation with 3D-Aware Positional Encoding for Monocular 3D Object Detection

Zizhang Wu, Yunzhe Wu, Jian Pu, Xianzhi Li, Xiaoquan Wang

PDF

Open Access 1 Video

TL;DR

This paper introduces ADD, a novel attention-based knowledge distillation framework with 3D-aware positional encoding, significantly improving monocular 3D object detection accuracy without extra inference costs.

Contribution

The paper proposes a new knowledge distillation method using a teacher with ground-truth depth, featuring 3D-aware self- and cross-attention modules for better 3D feature learning.

Findings

01

Achieves state-of-the-art results on KITTI benchmark

02

No additional inference cost over baseline models

03

Effective across multiple monocular detectors

Abstract

Monocular 3D object detection is a low-cost but challenging task, as it requires generating accurate 3D localization solely from a single image input. Recent developed depth-assisted methods show promising results by using explicit depth maps as intermediate features, which are either precomputed by monocular depth estimation networks or jointly evaluated with 3D object detection. However, inevitable errors from estimated depth priors may lead to misaligned semantic information and 3D localization, hence resulting in feature smearing and suboptimal predictions. To mitigate this issue, we propose ADD, an Attention-based Depth knowledge Distillation framework with 3D-aware positional encoding. Unlike previous knowledge distillation frameworks that adopt stereo- or LiDAR-based teachers, we build up our teacher with identical architecture as the student but with extra ground-truth depth as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Attention-based Depth Distillation with 3D-Aware Positional Encoding for Monocular 3D Object Detection· underline

Taxonomy

TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · Industrial Vision Systems and Defect Detection

MethodsKnowledge Distillation