JPerceiver: Joint Perception Network for Depth, Pose and Layout   Estimation in Driving Scenes

Haimei Zhao; Jing Zhang; Sen Zhang; Dacheng Tao

arXiv:2207.07895·cs.CV·July 19, 2022·1 cites

JPerceiver: Joint Perception Network for Depth, Pose and Layout Estimation in Driving Scenes

Haimei Zhao, Jing Zhang, Sen Zhang, Dacheng Tao

PDF

Open Access 1 Repo

TL;DR

JPerceiver is a joint perception framework that simultaneously estimates depth, visual odometry, and bird's-eye-view scene layout from monocular videos, leveraging cross-view geometric transformation and attention mechanisms for improved accuracy and efficiency.

Contribution

It introduces a novel end-to-end multi-task learning approach that unifies depth, VO, and BEV layout estimation with cross-view geometric and transfer modules, addressing scale ambiguity issues.

Findings

01

Outperforms existing methods on Argoverse, Nuscenes, and KITTI datasets.

02

Achieves higher accuracy in depth, pose, and layout estimation.

03

Offers a more efficient model with reduced inference time.

Abstract

Depth estimation, visual odometry (VO), and bird's-eye-view (BEV) scene layout estimation present three critical tasks for driving scene perception, which is fundamental for motion planning and navigation in autonomous driving. Though they are complementary to each other, prior works usually focus on each individual task and rarely deal with all three tasks together. A naive way is to accomplish them independently in a sequential or parallel manner, but there are many drawbacks, i.e., 1) the depth and VO results suffer from the inherent scale ambiguity issue; 2) the BEV layout is directly predicted from the front-view image without using any depth-related information, although the depth map contains useful geometry clues for inferring scene layouts. In this paper, we address these issues by proposing a novel joint perception framework named JPerceiver, which can simultaneously estimate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sunnyhelen/jperceiver
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Advanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Dropout · Byte Pair Encoding · Adam · Residual Connection · Label Smoothing · Position-Wise Feed-Forward Layer