Capsules as viewpoint learners for human pose estimation
Nicola Garau, Nicola Conci

TL;DR
This paper introduces a novel capsule network architecture for human pose estimation that achieves viewpoint equivariance, leading to better generalization across different camera angles and reduced data dependency.
Contribution
The work presents a new end-to-end capsule autoencoder with Variational Bayes routing for human pose estimation, improving viewpoint generalization and inference speed.
Findings
State-of-the-art results on multiple datasets
Enhanced generalization to unseen viewpoints
Lower data dependency and faster inference
Abstract
The task of human pose estimation (HPE) deals with the ill-posed problem of estimating the 3D position of human joints directly from images and videos. In recent literature, most of the works tackle the problem mostly by using convolutional neural networks (CNNs), which are capable of achieving state-of-the-art results in most datasets. We show how most neural networks are not able to generalize well when the camera is subject to significant viewpoint changes. This behaviour emerges because CNNs lack the capability of modelling viewpoint equivariance, while they rather rely on viewpoint invariance, resulting in high data dependency. Recently, capsule networks (CapsNets) have been proposed in the multi-class classification field as a solution to the viewpoint equivariance issue, reducing both the size and complexity of both the training datasets and the network itself. In this work, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Infrared Thermography in Medicine · Diabetic Foot Ulcer Assessment and Management
MethodsTest
