Multi-modal 3D Human Pose Estimation with 2D Weak Supervision in   Autonomous Driving

Jingxiao Zheng; Xinwei Shi; Alexander Gorban; Junhua Mao; Yang Song,; Charles R. Qi; Ting Liu; Visesh Chari; Andre Cornman; Yin Zhou; Congcong Li,; Dragomir Anguelov

arXiv:2112.12141·cs.CV·December 23, 2021

Multi-modal 3D Human Pose Estimation with 2D Weak Supervision in Autonomous Driving

Jingxiao Zheng, Xinwei Shi, Alexander Gorban, Junhua Mao, Yang Song,, Charles R. Qi, Ting Liu, Visesh Chari, Andre Cornman, Yin Zhou, Congcong Li,, Dragomir Anguelov

PDF

Open Access

TL;DR

This paper introduces a multi-modal approach for 3D human pose estimation in autonomous vehicles that leverages 2D image labels as weak supervision, combining LiDAR and camera data to improve accuracy.

Contribution

It presents one of the first methods to use 2D weak supervision with multi-modal data for 3D HPE in autonomous driving, addressing data scarcity issues.

Findings

01

22% improvement over camera-only 2D HPE baseline

02

6% improvement over LiDAR-only model

03

Effective use of multi-modal data with auxiliary segmentation

Abstract

3D human pose estimation (HPE) in autonomous vehicles (AV) differs from other use cases in many factors, including the 3D resolution and range of data, absence of dense depth maps, failure modes for LiDAR, relative location between the camera and LiDAR, and a high bar for estimation accuracy. Data collected for other use cases (such as virtual reality, gaming, and animation) may therefore not be usable for AV applications. This necessitates the collection and annotation of a large amount of 3D data for HPE in AV, which is time-consuming and expensive. In this paper, we propose one of the first approaches to alleviate this problem in the AV setting. Specifically, we propose a multi-modal approach which uses 2D labels on RGB images as weak supervision to perform 3D HPE. The proposed multi-modal architecture incorporates LiDAR and camera inputs with an auxiliary segmentation branch. On the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Advanced Vision and Imaging