Direct Prediction of 3D Body Poses from Motion Compensated Sequences
Bugra Tekin, Artem Rozantsev, Vincent Lepetit, Pascal Fua

TL;DR
This paper introduces a direct method for estimating 3D human poses from motion-compensated video sequences, significantly improving accuracy over previous frame-by-frame approaches by leveraging spatio-temporal information.
Contribution
It presents a novel approach that directly regresses 3D poses from spatio-temporal volumes with motion compensation, outperforming existing methods on multiple benchmarks.
Findings
Achieves state-of-the-art results on Human3.6m, HumanEva, and KTH Multiview Football datasets.
Demonstrates the importance of motion compensation for accurate 3D pose estimation.
Outperforms previous methods by a large margin.
Abstract
We propose an efficient approach to exploiting motion information from consecutive frames of a video sequence to recover the 3D pose of people. Previous approaches typically compute candidate poses in individual frames and then link them in a post-processing step to resolve ambiguities. By contrast, we directly regress from a spatio-temporal volume of bounding boxes to a 3D pose in the central frame. We further show that, for this approach to achieve its full potential, it is essential to compensate for the motion in consecutive frames so that the subject remains centered. This then allows us to effectively overcome ambiguities and improve upon the state-of-the-art by a large margin on the Human3.6m, HumanEva, and KTH Multiview Football 3D human pose estimation benchmarks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
