GSNet: Joint Vehicle Pose and Shape Reconstruction with Geometrical and Scene-aware Supervision
Lei Ke, Shichao Li, Yanan Sun, Yu-Wing Tai, Chi-Keung Tang

TL;DR
GSNet is an end-to-end framework that jointly estimates 6DoF vehicle poses and reconstructs detailed 3D shapes from single street view images, leveraging a novel feature fusion scheme and scene-aware regularization.
Contribution
The paper introduces GSNet, a novel joint vehicle pose and shape estimation framework with a unique feature fusion scheme and scene-aware loss, achieving state-of-the-art results.
Findings
Achieves state-of-the-art performance on ApolloCar3D benchmark.
Reconstructs detailed 3D vehicle shapes with 1352 vertices and 2700 faces.
Improves pose estimation accuracy through geometrical and scene-aware regularization.
Abstract
We present a novel end-to-end framework named as GSNet (Geometric and Scene-aware Network), which jointly estimates 6DoF poses and reconstructs detailed 3D car shapes from single urban street view. GSNet utilizes a unique four-way feature extraction and fusion scheme and directly regresses 6DoF poses and shapes in a single forward pass. Extensive experiments show that our diverse feature extraction and fusion scheme can greatly improve model performance. Based on a divide-and-conquer 3D shape representation strategy, GSNet reconstructs 3D vehicle shape with great detail (1352 vertices and 2700 faces). This dense mesh representation further leads us to consider geometrical consistency and scene context, and inspires a new multi-objective loss function to regularize network training, which in turn improves the accuracy of 6D pose estimation and validates the merit of jointly performing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · 3D Shape Modeling and Analysis · Human Pose and Action Recognition
