VSRD++: Autolabeling for 3D Object Detection via Instance-Aware Volumetric Silhouette Rendering

Zihua Liu; Hiroki Sakuma; Masatoshi Okutomi

arXiv:2512.01178·cs.CV·December 2, 2025

VSRD++: Autolabeling for 3D Object Detection via Instance-Aware Volumetric Silhouette Rendering

Zihua Liu, Hiroki Sakuma, Masatoshi Okutomi

PDF

Open Access

TL;DR

VSRD++ introduces a weakly supervised framework for monocular 3D object detection that uses volumetric silhouette rendering and autolabeling to eliminate the need for extensive 3D annotations, achieving superior results on KITTI-360.

Contribution

The paper presents a novel two-stage weakly supervised approach combining volumetric rendering and autolabeling to improve monocular 3D detection without 3D annotations.

Findings

01

Outperforms existing weakly supervised methods on KITTI-360

02

Effectively handles static and dynamic scenes

03

Utilizes volumetric silhouette rendering for autolabeling

Abstract

Monocular 3D object detection is a fundamental yet challenging task in 3D scene understanding. Existing approaches heavily depend on supervised learning with extensive 3D annotations, which are often acquired from LiDAR point clouds through labor-intensive labeling processes. To tackle this problem, we propose VSRD++, a novel weakly supervised framework for monocular 3D object detection that eliminates the reliance on 3D annotations and leverages neural-field-based volumetric rendering with weak 2D supervision. VSRD++ consists of a two-stage pipeline: multi-view 3D autolabeling and subsequent monocular 3D detector training. In the multi-view autolabeling stage, object surfaces are represented as signed distance fields (SDFs) and rendered as instance masks via the proposed instance-aware volumetric silhouette rendering. To optimize 3D bounding boxes, we decompose each instance's SDF into…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Advanced Neural Network Applications · Robotics and Sensor-Based Localization