Leveraging Temporal Joint Depths for Improving 3D Human Pose Estimation in Video
Naoki Kato, Hiroto Honda, Yusuke Uchida

TL;DR
This paper introduces a method that leverages temporal joint depth information in videos to enhance the accuracy of 3D human pose estimation, addressing depth ambiguity issues present in 2D pose predictions.
Contribution
It proposes a novel approach that refines 3D human poses by incorporating temporal joint depth information, improving accuracy over existing methods.
Findings
Reduced depth ambiguity in 3D pose estimation
Improved accuracy in 3D human pose predictions
Effective use of temporal information in videos
Abstract
The effectiveness of the approaches to predict 3D poses from 2D poses estimated in each frame of a video has been demonstrated for 3D human pose estimation. However, 2D poses without appearance information of persons have much ambiguity with respect to the joint depths. In this paper, we propose to estimate a 3D pose in each frame of a video and refine it considering temporal information. The proposed approach reduces the ambiguity of the joint depths and improves the 3D pose estimation accuracy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Diabetic Foot Ulcer Assessment and Management
