Rethink Sparse Signals for Pose-guided Text-to-image Generation

Wenjie Xuan; Jing Zhang; Juhua Liu; Bo Du; Dacheng Tao

arXiv:2506.20983·cs.CV·June 27, 2025

Rethink Sparse Signals for Pose-guided Text-to-image Generation

Wenjie Xuan, Jing Zhang, Juhua Liu, Bo Du, Dacheng Tao

PDF

Open Access 1 Repo

TL;DR

This paper introduces SP-Ctrl, a novel method that enhances sparse pose signals for text-to-image generation, achieving superior control and alignment while overcoming challenges associated with dense representations.

Contribution

We propose a learnable spatial representation and keypoint concept learning to improve sparse pose guidance in text-to-image generation, outperforming recent methods.

Findings

01

Outperforms recent spatially controllable T2I methods with sparse pose guidance

02

Matches the performance of dense signal-based methods in pose-guided generation

03

Demonstrates effective cross-species and diverse generation capabilities

Abstract

Recent works favored dense signals (e.g., depth, DensePose), as an alternative to sparse signals (e.g., OpenPose), to provide detailed spatial guidance for pose-guided text-to-image generation. However, dense representations raised new challenges, including editing difficulties and potential inconsistencies with textual prompts. This fact motivates us to revisit sparse signals for pose guidance, owing to their simplicity and shape-agnostic nature, which remains underexplored. This paper proposes a novel Spatial-Pose ControlNet(SP-Ctrl), equipping sparse signals with robust controllability for pose-guided image generation. Specifically, we extend OpenPose to a learnable spatial representation, making keypoint embeddings discriminative and expressive. Additionally, we introduce keypoint concept learning, which encourages keypoint tokens to attend to the spatial positions of each keypoint,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dreamxfar/sp-ctrl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Handwritten Text Recognition Techniques · Multimodal Machine Learning Applications

MethodsOpenPose