Progressive Coordinate Transforms for Monocular 3D Object Detection
Li Wang, Li Zhang, Yi Zhu, Zhi Zhang, Tong He, Mu Li, Xiangyang Xue

TL;DR
This paper introduces Progressive Coordinate Transforms (PCT), a lightweight method that improves monocular 3D object detection by refining object localization through a confidence-aware loss and semantic features, achieving superior results on KITTI and Waymo datasets.
Contribution
The paper proposes a novel, lightweight coordinate refinement approach called PCT that enhances monocular 3D detection accuracy by progressively improving localization predictions.
Findings
Achieves state-of-the-art results on KITTI and Waymo benchmarks.
Demonstrates generalization across various coordinate-based 3D detection frameworks.
Provides a simple yet effective localization refinement mechanism.
Abstract
Recognizing and localizing objects in the 3D space is a crucial ability for an AI agent to perceive its surrounding environment. While significant progress has been achieved with expensive LiDAR point clouds, it poses a great challenge for 3D object detection given only a monocular image. While there exist different alternatives for tackling this problem, it is found that they are either equipped with heavy networks to fuse RGB and depth information or empirically ineffective to process millions of pseudo-LiDAR points. With in-depth examination, we realize that these limitations are rooted in inaccurate object localization. In this paper, we propose a novel and lightweight approach, dubbed {\em Progressive Coordinate Transforms} (PCT) to facilitate learning coordinate representations. Specifically, a localization boosting mechanism with confidence-aware loss is introduced to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · 3D Surveying and Cultural Heritage
MethodsPerceptual control theoretic architecture
