Permutation-Invariant Relational Network for Multi-person 3D Pose Estimation
Nicolas Ugrinovic, Adria Ruiz, Antonio Agudo, Alberto Sanfeliu,, Francesc Moreno-Noguer

TL;DR
This paper introduces a permutation-invariant neural network that models long-range interactions among multiple people in a scene, significantly improving 3D pose estimation accuracy from single images.
Contribution
It proposes a residual-like Set Transformer-based network that captures long-range interactions independently of input order, enhancing multi-person 3D pose estimation.
Findings
Achieves state-of-the-art results on benchmark datasets.
Effectively refines initial 3D pose estimates.
Works efficiently as a drop-in module for existing detectors.
Abstract
The recovery of multi-person 3D poses from a single RGB image is a severely ill-conditioned problem due to the inherent 2D-3D depth ambiguity, inter-person occlusions, and body truncations. To tackle these issues, recent works have shown promising results by simultaneously reasoning for different people. However, in most cases this is done by only considering pairwise person interactions, hindering thus a holistic scene representation able to capture long-range interactions. This is addressed by approaches that jointly process all people in the scene, although they require defining one of the individuals as a reference and a pre-defined person ordering, being sensitive to this choice. In this paper, we overcome both these limitations, and we propose an approach for multi-person 3D pose estimation that captures long-range interactions independently of the input order. For this purpose,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Advanced Vision and Imaging
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Dense Connections · Layer Normalization · Absolute Position Encodings · Softmax · Residual Connection
