Sapiens2

Rawal Khirodkar; He Wen; Julieta Martinez; Yuan Dong; Su Zhaoen; Shunsuke Saito

arXiv:2604.21681·cs.CV·April 24, 2026

Sapiens2

Rawal Khirodkar, He Wen, Julieta Martinez, Yuan Dong, Su Zhaoen, Shunsuke Saito

PDF

1 Repo 37 Models

TL;DR

Sapiens2 introduces a versatile high-resolution transformer model family for human-centric vision tasks, achieving state-of-the-art results across multiple benchmarks through novel pretraining and architectural innovations.

Contribution

The paper presents Sapiens2, a new high-resolution transformer model with improved pretraining strategies, dataset, and architecture, enabling superior performance on diverse human-centric vision tasks.

Findings

01

Sapiens2 outperforms previous models on pose estimation (+4 mAP)

02

Achieves significant improvements in body-part segmentation (+24.3 mIoU)

03

Reduces normal estimation error by 45.6% in angular error

Abstract

We present Sapiens2, a model family of high-resolution transformers for human-centric vision focused on generalization, versatility, and high-fidelity outputs. Our model sizes range from 0.4 to 5 billion parameters, with native 1K resolution and hierarchical variants that support 4K. Sapiens2 substantially improves over its predecessor in both pretraining and post-training. First, to learn features that capture low-level details (for dense prediction) and high-level semantics (for zero-shot or few-label settings), we combine masked image reconstruction with self-distilled contrastive objectives. Our evaluations show that this unified pretraining objective is better suited for a wider range of downstream tasks. Second, along the data axis, we pretrain on a curated dataset of 1 billion high-quality human images and improve the quality and quantity of task annotations. Third,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

facebookresearch/sapiens2
github

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.