TL;DR
This paper introduces a novel spatio-temporal fusion method for multi-view 3D human pose estimation, leveraging sparse interleaved inputs to enhance temporal resolution and performance, with the DenseWarper model utilizing epipolar geometry.
Contribution
It proposes a new sparse interleaved input approach and the DenseWarper model, improving 3D pose estimation by capturing rich spatio-temporal information and increasing frame rate.
Findings
Outperforms traditional dense multi-view methods on Human3.6M and MPI-INF-3DHP datasets.
Achieves state-of-the-art performance with sparse interleaved inputs.
Theoretically increases pose frame rate by N times with N cameras.
Abstract
In multi-view 3D human pose estimation, models typically rely on images captured simultaneously from different camera views to predict a pose at a specific moment. While providing accurate spatial information, this traditional approach often overlooks the rich temporal dependencies between adjacent frames. We propose a novel 3D human pose estimation input method: the sparse interleaved input to address this. This method leverages images captured from different camera views at various time points (e.g., View 1 at time and View 2 at time ), allowing our model to capture rich spatio-temporal information and effectively boost performance. More importantly, this approach offers two key advantages: First, it can theoretically increase the output pose frame rate by N times with N cameras, thereby breaking through single-view frame rate limitations and enhancing the temporal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
