TL;DR
Seek-CAD introduces a self-refined, training-free generative modeling approach for 3D parametric CAD using local open-source LLMs and visual feedback, advancing CAD design automation.
Contribution
This paper pioneers the integration of visual and Chain-of-Thought feedback in a self-refinement mechanism for CAD model generation using open-source LLMs.
Findings
Effective CAD model generation validated by experiments
Self-refinement improves model accuracy and quality
New SSR-based CAD dataset supports industrial applications
Abstract
The advent of Computer-Aided Design (CAD) generative modeling will significantly transform the design of industrial products. The recent research endeavor has extended into the realm of Large Language Models (LLMs). In contrast to fine-tuning methods, training-free approaches typically utilize the advanced closed-source LLMs, thereby offering enhanced flexibility and efficiency in the development of AI agents for generating CAD parametric models. However, the substantial cost and limitations of local deployment of the top-tier closed-source LLMs pose challenges in practical applications. The Seek-CAD is the pioneer exploration of locally deployed open-source inference LLM DeepSeek-R1 for CAD parametric model generation with a training-free methodology. This study is the first investigation to incorporate both visual and Chain-of-Thought (CoT) feedback within the self-refinement…
Peer Reviews
Decision·ICLR 2026 Poster
The proposed training-free approach is novel and the first few works that explored this direction. The SSR data format with captype reference is also more general and applicable to real-world scenarios than simple sketch-and-extrude. SVF is also a nice solution for incorporating step-vise visual feedback into the system and enable the model to verify each step in the built process. Evaluation on the new dataset demonstrate the improvement of seek-cad.
Writing and paper layout can be improved. Figure 1 is too small, and SSR definition is in the later paragraph whereas a lot of reference to it is at the front. Overall, this makes reading the paper difficult than it should be. Evaluation is done entirely on the authors’ new SSR dataset. There is no comparison to previous methods on existing public CAD data like DeepCAD / Omni-CAD / WHUCAD. Figure 7 and 8 shows their dataset is much more complex than DeepCAD, this raise the concern that metric
1.The use of step-wise visual renders paired with the LLM's Chain-of-Thought for feedback is new. This provides a richer, more granular signal for refinement than methods using only the final render. 2.The proposed SSR triple and the CapType reference mechanism enables the generation of complex CAD models beyond the limitations of prior "Sketch-Extrude" methods.
1.The paper mentions that models failing to compile are excluded from metric calculation. A more detailed analysis of the reasons for compilation failures would be insightful. 2.While the CapType mechanism is innovative, the description in the appendix mentions that when refinement commands involve primitives not identifiable by CapType, those primitives are simply excluded. How often does this happen in the dataset/generation? Does it lead to models that are missing intended refinements? 3.T
1- Method novelty in feedback: evaluates intermediate renders + CoT rather than final-image only; ablations show inter-image cues matter. 2- Empirical signal: Seek-CAD beats prior training-free refiners and edges a tuned model on geometric fidelity (CD/HD/IoGT), with qualitative evidence.
1- VLM Feedback Quality: The authors acknowledge that VLMs struggle with geometric descriptions without domain-specific training (Section 5.5), but this fundamental limitation undermines the core refinement mechanism. No quantitative analysis is provided on how often Gemini-2.0's feedback is actually helpful vs. harmful. 2- Compilation Failure Rate: The Pass@k metric reveals concerning compilation failure rates. Even after 2 refinement rounds, only 55% of generated models compile successfully (
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
