VSRD++: Autolabeling for 3D Object Detection via Instance-Aware Volumetric Silhouette Rendering
Zihua Liu, Hiroki Sakuma, Masatoshi Okutomi

TL;DR
VSRD++ introduces a weakly supervised framework for monocular 3D object detection that uses volumetric silhouette rendering and autolabeling to eliminate the need for extensive 3D annotations, achieving superior results on KITTI-360.
Contribution
The paper presents a novel two-stage weakly supervised approach combining volumetric rendering and autolabeling to improve monocular 3D detection without 3D annotations.
Findings
Outperforms existing weakly supervised methods on KITTI-360
Effectively handles static and dynamic scenes
Utilizes volumetric silhouette rendering for autolabeling
Abstract
Monocular 3D object detection is a fundamental yet challenging task in 3D scene understanding. Existing approaches heavily depend on supervised learning with extensive 3D annotations, which are often acquired from LiDAR point clouds through labor-intensive labeling processes. To tackle this problem, we propose VSRD++, a novel weakly supervised framework for monocular 3D object detection that eliminates the reliance on 3D annotations and leverages neural-field-based volumetric rendering with weak 2D supervision. VSRD++ consists of a two-stage pipeline: multi-view 3D autolabeling and subsequent monocular 3D detector training. In the multi-view autolabeling stage, object surfaces are represented as signed distance fields (SDFs) and rendered as instance masks via the proposed instance-aware volumetric silhouette rendering. To optimize 3D bounding boxes, we decompose each instance's SDF into…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Advanced Neural Network Applications · Robotics and Sensor-Based Localization
