Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single Shot

Fabien Baradel; Matthieu Armando; Salma Galaaoui; Romain Br\'egier,; Philippe Weinzaepfel; Gr\'egory Rogez; Thomas Lucas

arXiv:2402.14654·cs.CV·July 25, 2024·1 cites

Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single Shot

Fabien Baradel, Matthieu Armando, Salma Galaaoui, Romain Br\'egier,, Philippe Weinzaepfel, Gr\'egory Rogez, Thomas Lucas

PDF

Open Access 1 Repo 1 Models 1 Datasets

TL;DR

Multi-HMR is a novel single-shot model capable of recovering full-body 3D human meshes, including hands and facial expressions, from a single RGB image, using a transformer-based architecture and a new dataset for training.

Contribution

It introduces Multi-HMR, a transformer-based approach for multi-person whole-body mesh recovery, and the CUFFS dataset for improved hand pose estimation.

Findings

01

Achieves state-of-the-art results on whole-body benchmarks.

02

Incorporating CUFFS dataset improves hand pose predictions.

03

Fast and competitive performance with ViT-S backbone at 448x448 resolution.

Abstract

We present Multi-HMR, a strong sigle-shot model for multi-person 3D human mesh recovery from a single RGB image. Predictions encompass the whole body, i.e., including hands and facial expressions, using the SMPL-X parametric model and 3D location in the camera coordinate system. Our model detects people by predicting coarse 2D heatmaps of person locations, using features produced by a standard Vision Transformer (ViT) backbone. It then predicts their whole-body pose, shape and 3D location using a new cross-attention module called the Human Prediction Head (HPH), with one query attending to the entire set of features for each detected person. As direct prediction of fine-grained hands and facial poses in a single shot, i.e., without relying on explicit crops around body parts, is hard to learn from existing data, we introduce CUFFS, the Close-Up Frames of Full-Body Subjects dataset,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

naver/multi-hmr
pytorchOfficial

Models

🤗
naver/multiHMR_896_L
model· ♡ 3
♡ 3

Datasets

naver/CUFFS
dataset· 10 dl
10 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced X-ray and CT Imaging

MethodsAttention Is All You Need · Sparse Evolutionary Training · Linear Layer · Concatenated Skip Connection · Dense Connections · Label Smoothing · Adam · Vision Transformer · Softmax · Multi-Head Attention