CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs
Siyu Wang, Cailian Chen, Xinyi Le, Qimin Xu, Lei Xu, Yanzhou Zhang, Jie Yang

TL;DR
CAD-GPT is a novel multimodal large language model that enhances CAD model synthesis by incorporating spatial reasoning, enabling more accurate 3D spatial inference from images or text descriptions, and outperforming existing methods.
Contribution
Introduces CAD-GPT with a 3D spatial mechanism that improves spatial inference accuracy in CAD synthesis from images or text descriptions.
Findings
Outperforms state-of-the-art CAD synthesis methods
Achieves higher accuracy in spatial positioning and orientation
Demonstrates robustness across various input modalities
Abstract
Computer-aided design (CAD) significantly enhances the efficiency, accuracy, and innovation of design processes by enabling precise 2D and 3D modeling, extensive analysis, and optimization. Existing methods for creating CAD models rely on latent vectors or point clouds, which are difficult to obtain, and storage costs are substantial. Recent advances in Multimodal Large Language Models (MLLMs) have inspired researchers to use natural language instructions and images for CAD model construction. However, these models still struggle with inferring accurate 3D spatial location and orientation, leading to inaccuracies in determining the spatial 3D starting points and extrusion directions for constructing geometries. This work introduces CAD-GPT, a CAD synthesis method with spatial reasoning-enhanced MLLM that takes either a single image or a textual description as input. To achieve precise…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Manufacturing Process and Optimization · BIM and Construction Integration
