PoseKernelLifter: Metric Lifting of 3D Human Pose using Sound
Zhijian Yang, Xiaoran Fan, Volkan Isler, Hyun Soo Park

TL;DR
This paper introduces PoseKernelLifter, a method that uses audio signals along with visual data to accurately reconstruct 3D human poses in metric scale from a single image, addressing a key challenge in pose estimation.
Contribution
It proposes a novel audio-based pose kernel and a multi-modal CNN to achieve metric 3D pose reconstruction, outperforming existing methods.
Findings
Achieves accurate metric 3D pose reconstruction in real scenes.
Outperforms state-of-the-art methods in scale accuracy.
Introduces a generalizable pose kernel invariant to scene changes.
Abstract
Reconstructing the 3D pose of a person in metric scale from a single view image is a geometrically ill-posed problem. For example, we can not measure the exact distance of a person to the camera from a single view image without additional scene assumptions (e.g., known height). Existing learning based approaches circumvent this issue by reconstructing the 3D pose up to scale. However, there are many applications such as virtual telepresence, robotics, and augmented reality that require metric scale reconstruction. In this paper, we show that audio signals recorded along with an image, provide complementary information to reconstruct the metric 3D pose of the person. The key insight is that as the audio signals traverse across the 3D space, their interactions with the body provide metric information about the body's pose. Based on this insight, we introduce a time-invariant transfer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Advanced Vision and Imaging · Human Motion and Animation
Methods3 Dimensional Convolutional Neural Network
