3D Human Pose Estimation with Siamese Equivariant Embedding
M\'arton V\'eges, Viktor Varga, Andr\'as L\H{o}rincz

TL;DR
This paper introduces a siamese neural network architecture that learns rotation-equivariant embeddings to improve monocular 3D human pose estimation, reducing overfitting to camera angles and achieving state-of-the-art cross-camera accuracy.
Contribution
The authors propose a novel siamese network with rotation-equivariant embeddings that enhances 3D pose estimation robustness across different camera views.
Findings
Consistent error reduction across multiple datasets
State-of-the-art cross-camera error rate
Effective with various base networks
Abstract
In monocular 3D human pose estimation a common setup is to first detect 2D positions and then lift the detection into 3D coordinates. Many algorithms suffer from overfitting to camera positions in the training set. We propose a siamese architecture that learns a rotation equivariant hidden representation to reduce the need for data augmentation. Our method is evaluated on multiple databases with different base networks and shows a consistent improvement of error metrics. It achieves state-of-the-art cross-camera error rate among algorithms that use estimated 2D joint coordinates only.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Advanced Vision and Imaging
