SAM-pose2seg: Pose-Guided Human Instance Segmentation in Crowds

Constantin Kolomiiets; Miroslav Purkrabek; Jiri Matas

arXiv:2601.08982·cs.CV·January 19, 2026

SAM-pose2seg: Pose-Guided Human Instance Segmentation in Crowds

Constantin Kolomiiets, Miroslav Purkrabek, Jiri Matas

PDF

Open Access

TL;DR

This paper introduces SAM-pose2seg, a pose-guided human segmentation method that enhances SAM's robustness to occlusion by incorporating pose keypoints during fine-tuning, enabling accurate segmentation with minimal keypoints.

Contribution

It adapts SAM 2.1 for pose-guided segmentation with minimal modifications and introduces PoseMaskRefine for improved occlusion handling and robustness.

Findings

01

Improved segmentation accuracy across multiple datasets.

02

Effective from as few as one keypoint during inference.

03

Maintains generalization capabilities of the original SAM.

Abstract

Segment Anything (SAM) provides an unprecedented foundation for human segmentation, but may struggle under occlusion, where keypoints may be partially or fully invisible. We adapt SAM 2.1 for pose-guided segmentation with minimal encoder modifications, retaining its strong generalization. Using a fine-tuning strategy called PoseMaskRefine, we incorporate pose keypoints with high visibility into the iterative correction process originally employed by SAM, yielding improved robustness and accuracy across multiple datasets. During inference, we simplify prompting by selecting only the three keypoints with the highest visibility. This strategy reduces sensitivity to common errors, such as missing body parts or misclassified clothing, and allows accurate mask prediction from as few as a single keypoint. Our results demonstrate that pose-guided fine-tuning of SAM enables effective,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Robot Manipulation and Learning · Advanced Neural Network Applications