Generating CAD Code with Vision-Language Models for 3D Designs
Kamel Alrashedy, Pradyumna Tambwekar, Zulfiqar Zaidi, Megan, Langwasser, Wei Xu, Matthew Gombolay

TL;DR
This paper presents CADCodeVerify, a novel iterative verification method using Vision-Language Models to improve the accuracy of AI-generated 3D CAD objects from natural language prompts, supported by a new benchmark.
Contribution
Introduces CADCodeVerify, a feedback-driven approach that enhances CAD code generation accuracy using visual validation, and CADPrompt, a benchmark for evaluating CAD code generation.
Findings
CADCodeVerify improves 3D object structure quality.
Achieves a 7.30% reduction in Point Cloud distance.
Increases success rate of CAD code compilation by 5%.
Abstract
Generative AI has transformed the fields of Design and Manufacturing by providing efficient and automated methods for generating and modifying 3D objects. One approach involves using Large Language Models (LLMs) to generate Computer- Aided Design (CAD) scripting code, which can then be executed to render a 3D object; however, the resulting 3D object may not meet the specified requirements. Testing the correctness of CAD generated code is challenging due to the complexity and structure of 3D objects (e.g., shapes, surfaces, and dimensions) that are not feasible in code. In this paper, we introduce CADCodeVerify, a novel approach to iteratively verify and improve 3D objects generated from CAD code. Our approach works by producing ameliorative feedback by prompting a Vision-Language Model (VLM) to generate and answer a set of validation questions to verify the generated object and prompt…
Peer Reviews
Decision·ICLR 2025 Poster
- Novelty and Originality: The integration of question/answering based VLM to refine the object quality. - Evaluation: The increase of success rate of compilable outputs at the end of the refinement shows that the method can improve the generation of CAD code from language prompt. - Soundness: The technical approach is sound and could be applied to a lot of CAD or CAV tasks. The approach is a specification refinement method using generative AI. From an initial specification, an initial object
- Dataset Scope: CADPrompt could have been better introduce, showing the most complex objects in terms of complexity and difficulty metrics. - Complexity Metric for Objects: This paper measures object complexity by counting vertices and faces. However, using metrics like bounding boxes or decomposed bounding volumes could better reflect structural complexity. Vertices mainly define shape details, not true complexity—a simple shape like a cube can have many polygons, while complex shapes like a
- The paper comes with a new benchmark suite (CADPrompt) which is a well-curated, crowd-sourced benchmark with annotations and quality checks, which is a valuable resource for assessing CAD code generation and refinement methods. - CADCodeVerify uses an interesting novel idea to eliminate the need for human involvement by generating validation questions and answering them using VLMs. - The geometric solver-based baseline is very interesting and gives an upper estimate of the self-refinement proc
- The fundamental differences between CADCodeVerify and 3D-Premise are unclear. For example, it’s not specified whether 3D-Premise uses execution-error-based repair. Also, both approaches seem to use totally different prompts, so it is not clear if it is just a matter of better prompting or something fundamental (such as the question-answer based method) - The paper would be stronger if the approach was also evaluated on the 3D premise dataset - Some other details are missing (see below question
- The contribution of a language-to-CAD dataset is useful for the community, and hiring CAD experts to write the CAD code for it means it's likely high quality. - The comparison is run across a range of LLMs including open source ones (CodeLlama) - They generally surpass the 3D-Premise baseline and score fairly close to the geometric solver baseline that gets to cheat and use the ground truth CAD model in calculating its feedback. They score particularly well on the more difficult problems, and
- The term "success rate" is a bit of a misleading name for something that means "compilation rate" or "compilation success rate" – it gives me the impression that the model succeeded at solving the task, not that it generated an arbitrary piece of code that compiled. For example in the abstract "increasing the success rate of the compiled program" or in the intro "5.5% increase in successful object generation" all gave me this impression until I dug into it. - Also, the abstract says 5.0% a
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsManufacturing Process and Optimization · 3D Surveying and Cultural Heritage
MethodsAttention Is All You Need · Sparse Evolutionary Training · Dense Connections · Residual Connection · Position-Wise Feed-Forward Layer · Adam · Linear Layer · Label Smoothing · Dropout · Byte Pair Encoding
