Leveraging Vision-Language Models for Manufacturing Feature Recognition in CAD Designs
Muhammad Tayyab Khan, Lequn Chen, Ye Han Ng, Wenhe Feng, Nicholas Yew, Jin Tan, Seung Ki Moon

TL;DR
This paper explores the use of vision-language models with prompt engineering to automate manufacturing feature recognition in CAD designs, achieving promising accuracy without extensive training data.
Contribution
It introduces a novel approach using VLMs and prompt techniques for CAD feature recognition, reducing reliance on large datasets and predefined rules.
Findings
Claude-3.5-Sonnet achieves 74% feature quantity accuracy.
GPT-4o has the lowest hallucination rate at 8%.
Open-source models show higher hallucination rates and lower accuracy.
Abstract
Automatic feature recognition (AFR) is essential for transforming design knowledge into actionable manufacturing information. Traditional AFR methods, which rely on predefined geometric rules and large datasets, are often time-consuming and lack generalizability across various manufacturing features. To address these challenges, this study investigates vision-language models (VLMs) for automating the recognition of a wide range of manufacturing features in CAD designs without the need for extensive training datasets or predefined rules. Instead, prompt engineering techniques, such as multi-view query images, few-shot learning, sequential reasoning, and chain-of-thought, are applied to enable recognition. The approach is evaluated on a newly developed CAD dataset containing designs of varying complexity relevant to machining, additive manufacturing, sheet metal forming, molding, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsManufacturing Process and Optimization · 3D Surveying and Cultural Heritage · Image Processing and 3D Reconstruction
MethodsMasked autoencoder
