TEMPO: Efficient Multi-View Pose Estimation, Tracking, and Forecasting
Rohan Choudhury, Kris Kitani, Laszlo A. Jeni

TL;DR
TEMPO is an efficient multi-view model that improves 3D human pose estimation, tracking, and forecasting by leveraging spatiotemporal features, achieving higher accuracy and speed without scene-specific tuning.
Contribution
We introduce TEMPO, a novel model that reduces computation while enhancing multi-view pose estimation, tracking, and forecasting through a unified spatiotemporal representation.
Findings
Achieves 10% better MPJPE than TesseTrack.
Provides a 33x increase in FPS.
Generalizes across datasets without fine-tuning.
Abstract
Existing volumetric methods for predicting 3D human pose estimation are accurate, but computationally expensive and optimized for single time-step prediction. We present TEMPO, an efficient multi-view pose estimation model that learns a robust spatiotemporal representation, improving pose accuracy while also tracking and forecasting human pose. We significantly reduce computation compared to the state-of-the-art by recurrently computing per-person 2D pose features, fusing both spatial and temporal information into a single representation. In doing so, our model is able to use spatiotemporal context to predict more accurate human poses without sacrificing efficiency. We further use this representation to track human poses over time as well as predict future poses. Finally, we demonstrate that our model is able to generalize across datasets without scene-specific fine-tuning. TEMPO…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
TEMPO: Efficient Multi-View Pose Estimation, Tracking, and Forecasting· youtube
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Hand Gesture Recognition Systems
