PointLLM-R: Enhancing 3D Point Cloud Reasoning via Chain-of-Thought
Chaoqi Chen, Qile Xu, Wenjun Zhou, Hui Huang

TL;DR
This paper introduces PointLLM-R, a 3D multimodal language model enhanced with Chain-of-Thought reasoning, built using a novel data-centric framework and a large-scale reasoning dataset for improved 3D point cloud understanding.
Contribution
It presents a new data-centric framework for constructing Chain-of-Thought supervision tailored to 3D point cloud understanding and develops PointLLM-R, a reasoning-capable 3D multimodal model.
Findings
PointLLM-R achieves state-of-the-art results in 3D classification and captioning.
The dataset PoCoTI contains 55K samples with explicit reasoning paths.
PointLLM-R generalizes well to real-world scanned point clouds and dialogue scenarios.
Abstract
Understanding 3D point clouds through language remains a fundamental challenge in computer graphics and visual computing, due to the irregular structure of point cloud data and the lack of explicit reasoning in existing 3D multimodal models. While Chain-of-Thought (CoT) reasoning has shown strong effectiveness in LLMs and image-based MLLMs, its extension to 3D understanding remains largely underexplored. In this paper, we propose a data-centric framework for constructing large-scale CoT supervision tailored to 3D point cloud understanding. Our framework consists of a two-stage pipeline that first refines point-text instruction data via vision-language-model-based quality evaluation and reference-guided refinement, and then synthesizes high-quality reasoning paths through Human-in-the-Loop Prompt Optimization (HiLPO). Using this approach, we build PoCoTI, a CoT-enhanced point-text…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
