CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs

Siyu Wang; Cailian Chen; Xinyi Le; Qimin Xu; Lei Xu; Yanzhou Zhang; Jie Yang

arXiv:2412.19663·cs.CV·June 24, 2025

CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs

Siyu Wang, Cailian Chen, Xinyi Le, Qimin Xu, Lei Xu, Yanzhou Zhang, Jie Yang

PDF

Open Access 1 Video

TL;DR

CAD-GPT is a novel multimodal large language model that enhances CAD model synthesis by incorporating spatial reasoning, enabling more accurate 3D spatial inference from images or text descriptions, and outperforming existing methods.

Contribution

Introduces CAD-GPT with a 3D spatial mechanism that improves spatial inference accuracy in CAD synthesis from images or text descriptions.

Findings

01

Outperforms state-of-the-art CAD synthesis methods

02

Achieves higher accuracy in spatial positioning and orientation

03

Demonstrates robustness across various input modalities

Abstract

Computer-aided design (CAD) significantly enhances the efficiency, accuracy, and innovation of design processes by enabling precise 2D and 3D modeling, extensive analysis, and optimization. Existing methods for creating CAD models rely on latent vectors or point clouds, which are difficult to obtain, and storage costs are substantial. Recent advances in Multimodal Large Language Models (MLLMs) have inspired researchers to use natural language instructions and images for CAD model construction. However, these models still struggle with inferring accurate 3D spatial location and orientation, leading to inaccuracies in determining the spatial 3D starting points and extrusion directions for constructing geometries. This work introduces CAD-GPT, a CAD synthesis method with spatial reasoning-enhanced MLLM that takes either a single image or a textual description as input. To achieve precise…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs· underline

Taxonomy

TopicsNatural Language Processing Techniques · Manufacturing Process and Optimization · BIM and Construction Integration