CoT-Pose: Chain-of-Thought Reasoning for 3D Pose Generation from Abstract Prompts

Junuk Cha; Jihyeon Kim

arXiv:2508.07540·cs.CV·August 12, 2025

CoT-Pose: Chain-of-Thought Reasoning for 3D Pose Generation from Abstract Prompts

Junuk Cha, Jihyeon Kim

PDF

Open Access

TL;DR

This paper introduces CoT-Pose, a novel framework that uses chain-of-thought reasoning to interpret abstract prompts into accurate 3D human poses, addressing limitations of existing models that rely on detailed prompts.

Contribution

The paper presents a reasoning-based approach for 3D pose generation from high-level prompts and a data synthesis pipeline for training, advancing the understanding of abstract language in pose prediction.

Findings

01

CoT-Pose effectively generates plausible 3D poses from abstract prompts.

02

The reasoning-enhanced model outperforms baseline methods on semantic alignment.

03

The data pipeline facilitates training with high-level language inputs.

Abstract

Recent advances in multi-modal large language models (MLLMs) and chain-of-thought (CoT) reasoning have led to significant progress in image and text generation tasks. However, the field of 3D human pose generation still faces critical limitations. Most existing text-to-pose models rely heavily on detailed (low-level) prompts that explicitly describe joint configurations. In contrast, humans tend to communicate actions and intentions using abstract (high-level) language. This mismatch results in a practical challenge for deploying pose generation systems in real-world scenarios. To bridge this gap, we introduce a novel framework that incorporates CoT reasoning into the pose generation process, enabling the interpretation of abstract prompts into accurate 3D human poses. We further propose a data synthesis pipeline that automatically generates triplets of abstract prompts, detailed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Social Robot Interaction and HRI