TL;DR
Sapiens2 introduces a versatile high-resolution transformer model family for human-centric vision tasks, achieving state-of-the-art results across multiple benchmarks through novel pretraining and architectural innovations.
Contribution
The paper presents Sapiens2, a new high-resolution transformer model with improved pretraining strategies, dataset, and architecture, enabling superior performance on diverse human-centric vision tasks.
Findings
Sapiens2 outperforms previous models on pose estimation (+4 mAP)
Achieves significant improvements in body-part segmentation (+24.3 mIoU)
Reduces normal estimation error by 45.6% in angular error
Abstract
We present Sapiens2, a model family of high-resolution transformers for human-centric vision focused on generalization, versatility, and high-fidelity outputs. Our model sizes range from 0.4 to 5 billion parameters, with native 1K resolution and hierarchical variants that support 4K. Sapiens2 substantially improves over its predecessor in both pretraining and post-training. First, to learn features that capture low-level details (for dense prediction) and high-level semantics (for zero-shot or few-label settings), we combine masked image reconstruction with self-distilled contrastive objectives. Our evaluations show that this unified pretraining objective is better suited for a wider range of downstream tasks. Second, along the data axis, we pretrain on a curated dataset of 1 billion high-quality human images and improve the quality and quantity of task annotations. Third,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗facebook/sapiens2model· ♡ 142♡ 142
- 🤗facebook/sapiens2-pretrain-0.4bmodel· ♡ 2♡ 2
- 🤗facebook/sapiens2-pretrain-5bmodel· ♡ 6♡ 6
- 🤗facebook/sapiens2-matting-1bmodel· ♡ 3♡ 3
- 🤗facebook/sapiens2-pretrain-0.1bmodel· ♡ 4♡ 4
- 🤗facebook/sapiens2-pretrain-0.8bmodel· ♡ 2♡ 2
- 🤗facebook/sapiens2-pretrain-1bmodel· ♡ 2♡ 2
- 🤗facebook/sapiens2-pose-0.4bmodel· ♡ 2♡ 2
- 🤗facebook/sapiens2-pose-0.8bmodel· ♡ 2♡ 2
- 🤗facebook/sapiens2-pose-1bmodel· ♡ 5♡ 5
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
