SAM-pose2seg: Pose-Guided Human Instance Segmentation in Crowds
Constantin Kolomiiets, Miroslav Purkrabek, Jiri Matas

TL;DR
This paper introduces SAM-pose2seg, a pose-guided human segmentation method that enhances SAM's robustness to occlusion by incorporating pose keypoints during fine-tuning, enabling accurate segmentation with minimal keypoints.
Contribution
It adapts SAM 2.1 for pose-guided segmentation with minimal modifications and introduces PoseMaskRefine for improved occlusion handling and robustness.
Findings
Improved segmentation accuracy across multiple datasets.
Effective from as few as one keypoint during inference.
Maintains generalization capabilities of the original SAM.
Abstract
Segment Anything (SAM) provides an unprecedented foundation for human segmentation, but may struggle under occlusion, where keypoints may be partially or fully invisible. We adapt SAM 2.1 for pose-guided segmentation with minimal encoder modifications, retaining its strong generalization. Using a fine-tuning strategy called PoseMaskRefine, we incorporate pose keypoints with high visibility into the iterative correction process originally employed by SAM, yielding improved robustness and accuracy across multiple datasets. During inference, we simplify prompting by selecting only the three keypoints with the highest visibility. This strategy reduces sensitivity to common errors, such as missing body parts or misclassified clothing, and allows accurate mask prediction from as few as a single keypoint. Our results demonstrate that pose-guided fine-tuning of SAM enables effective,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Robot Manipulation and Learning · Advanced Neural Network Applications
