Towards Accurate Human Pose Estimation in Videos of Crowded Scenes
Li Yuan, Shuning Chang, Xuecheng Nie, Ziyuan Huang, Yichen Zhou,, Yunpeng Chen, Jiashi Feng, Shuicheng Yan

TL;DR
This paper enhances human pose estimation in crowded videos by leveraging temporal context through optical flow and expanding training data with internet-mined scenes, achieving state-of-the-art results.
Contribution
It introduces a temporal refinement method using optical flow and expands training data with new internet-mined scenes for improved accuracy in crowded scenes.
Findings
Achieved best performance on 7 out of 13 videos
56.33 average w_AP on HIE challenge test dataset
Effective use of temporal context improves pose estimation stability
Abstract
Video-based human pose estimation in crowded scenes is a challenging problem due to occlusion, motion blur, scale variation and viewpoint change, etc. Prior approaches always fail to deal with this problem because of (1) lacking of usage of temporal information; (2) lacking of training data in crowded scenes. In this paper, we focus on improving human pose estimation in videos of crowded scenes from the perspectives of exploiting temporal context and collecting new data. In particular, we first follow the top-down strategy to detect persons and perform single-person pose estimation for each frame. Then, we refine the frame-based pose estimation with temporal contexts deriving from the optical-flow. Specifically, for one frame, we forward the historical poses from the previous frames and backward the future poses from the subsequent frames to current frame, leading to stable and accurate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
