CoT-Pose: Chain-of-Thought Reasoning for 3D Pose Generation from Abstract Prompts
Junuk Cha, Jihyeon Kim

TL;DR
This paper introduces CoT-Pose, a novel framework that uses chain-of-thought reasoning to interpret abstract prompts into accurate 3D human poses, addressing limitations of existing models that rely on detailed prompts.
Contribution
The paper presents a reasoning-based approach for 3D pose generation from high-level prompts and a data synthesis pipeline for training, advancing the understanding of abstract language in pose prediction.
Findings
CoT-Pose effectively generates plausible 3D poses from abstract prompts.
The reasoning-enhanced model outperforms baseline methods on semantic alignment.
The data pipeline facilitates training with high-level language inputs.
Abstract
Recent advances in multi-modal large language models (MLLMs) and chain-of-thought (CoT) reasoning have led to significant progress in image and text generation tasks. However, the field of 3D human pose generation still faces critical limitations. Most existing text-to-pose models rely heavily on detailed (low-level) prompts that explicitly describe joint configurations. In contrast, humans tend to communicate actions and intentions using abstract (high-level) language. This mismatch results in a practical challenge for deploying pose generation systems in real-world scenarios. To bridge this gap, we introduce a novel framework that incorporates CoT reasoning into the pose generation process, enabling the interpretation of abstract prompts into accurate 3D human poses. We further propose a data synthesis pipeline that automatically generates triplets of abstract prompts, detailed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Social Robot Interaction and HRI
