Permutation-Invariant Relational Network for Multi-person 3D Pose   Estimation

Nicolas Ugrinovic; Adria Ruiz; Antonio Agudo; Alberto Sanfeliu,; Francesc Moreno-Noguer

arXiv:2204.04913·cs.CV·June 1, 2022

Permutation-Invariant Relational Network for Multi-person 3D Pose Estimation

Nicolas Ugrinovic, Adria Ruiz, Antonio Agudo, Alberto Sanfeliu,, Francesc Moreno-Noguer

PDF

Open Access

TL;DR

This paper introduces a permutation-invariant neural network that models long-range interactions among multiple people in a scene, significantly improving 3D pose estimation accuracy from single images.

Contribution

It proposes a residual-like Set Transformer-based network that captures long-range interactions independently of input order, enhancing multi-person 3D pose estimation.

Findings

01

Achieves state-of-the-art results on benchmark datasets.

02

Effectively refines initial 3D pose estimates.

03

Works efficiently as a drop-in module for existing detectors.

Abstract

The recovery of multi-person 3D poses from a single RGB image is a severely ill-conditioned problem due to the inherent 2D-3D depth ambiguity, inter-person occlusions, and body truncations. To tackle these issues, recent works have shown promising results by simultaneously reasoning for different people. However, in most cases this is done by only considering pairwise person interactions, hindering thus a holistic scene representation able to capture long-range interactions. This is addressed by approaches that jointly process all people in the scene, although they require defining one of the individuals as a reference and a pre-defined person ordering, being sensitive to this choice. In this paper, we overcome both these limitations, and we propose an approach for multi-person 3D pose estimation that captures long-range interactions independently of the input order. For this purpose,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Advanced Vision and Imaging

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Dense Connections · Layer Normalization · Absolute Position Encodings · Softmax · Residual Connection