CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM
Jingwei Xu, Chenyu Wang, Zibo Zhao, Wen Liu, Yi Ma, Shenghua Gao

TL;DR
CAD-MLLM is a pioneering multimodal system that generates parametric CAD models conditioned on diverse inputs like text, images, and point clouds, utilizing a new comprehensive dataset and advanced alignment techniques.
Contribution
The paper introduces CAD-MLLM, the first system to unify multimodal inputs for CAD generation, supported by the novel Omni-CAD dataset and specialized evaluation metrics.
Findings
Outperforms existing conditional generative methods
Robust to noise and missing data
Achieves high-quality topology and surface enclosure
Abstract
This paper aims to design a unified Computer-Aided Design (CAD) generation system that can easily generate CAD models based on the user's inputs in the form of textual description, images, point clouds, or even a combination of them. Towards this goal, we introduce the CAD-MLLM, the first system capable of generating parametric CAD models conditioned on the multimodal input. Specifically, within the CAD-MLLM framework, we leverage the command sequences of CAD models and then employ advanced large language models (LLMs) to align the feature space across these diverse multi-modalities data and CAD models' vectorized representations. To facilitate the model training, we design a comprehensive data construction and annotation pipeline that equips each CAD model with corresponding multimodal data. Our resulting dataset, named Omni-CAD, is the first multimodal CAD dataset that contains…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsManufacturing Process and Optimization · Advanced Measurement and Metrology Techniques · BIM and Construction Integration
MethodsALIGN · Focus
