UniPose: A Unified Multimodal Framework for Human Pose Comprehension,   Generation and Editing

Yiheng Li; Ruibing Hou; Hong Chang; Shiguang Shan; Xilin Chen

arXiv:2411.16781·cs.CV·April 1, 2025

UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing

Yiheng Li, Ruibing Hou, Hong Chang, Shiguang Shan, Xilin Chen

PDF

Open Access 1 Repo

TL;DR

UniPose is a versatile framework that uses large language models and pose tokenization to understand, generate, and edit human poses across multiple modalities, enabling broader real-world applications.

Contribution

It introduces the first general-purpose multimodal framework for human pose comprehension, generation, and editing using LLMs and pose tokenization.

Findings

01

Achieves competitive performance across pose tasks

02

Effectively transfers knowledge between tasks

03

Adapts to unseen pose-related tasks

Abstract

Human pose plays a crucial role in the digital age. While recent works have achieved impressive progress in understanding and generating human poses, they often support only a single modality of control signals and operate in isolation, limiting their application in real-world scenarios. This paper presents UniPose, a framework employing Large Language Models (LLMs) to comprehend, generate, and edit human poses across various modalities, including images, text, and 3D SMPL poses. Specifically, we apply a pose tokenizer to convert 3D poses into discrete pose tokens, enabling seamless integration into the LLM within a unified vocabulary. To further enhance the fine-grained pose perception capabilities, we facilitate UniPose with a mixture of visual encoders, among them a pose-specific visual encoder. Benefiting from a unified learning strategy, UniPose effectively transfers knowledge…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

liyiheng23/unipose
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Human Motion and Animation