Leveraging Vision-Language Models for Manufacturing Feature Recognition   in CAD Designs

Muhammad Tayyab Khan; Lequn Chen; Ye Han Ng; Wenhe Feng; Nicholas Yew; Jin Tan; Seung Ki Moon

arXiv:2411.02810·cs.CE·November 6, 2024·2 cites

Leveraging Vision-Language Models for Manufacturing Feature Recognition in CAD Designs

Muhammad Tayyab Khan, Lequn Chen, Ye Han Ng, Wenhe Feng, Nicholas Yew, Jin Tan, Seung Ki Moon

PDF

Open Access

TL;DR

This paper explores the use of vision-language models with prompt engineering to automate manufacturing feature recognition in CAD designs, achieving promising accuracy without extensive training data.

Contribution

It introduces a novel approach using VLMs and prompt techniques for CAD feature recognition, reducing reliance on large datasets and predefined rules.

Findings

01

Claude-3.5-Sonnet achieves 74% feature quantity accuracy.

02

GPT-4o has the lowest hallucination rate at 8%.

03

Open-source models show higher hallucination rates and lower accuracy.

Abstract

Automatic feature recognition (AFR) is essential for transforming design knowledge into actionable manufacturing information. Traditional AFR methods, which rely on predefined geometric rules and large datasets, are often time-consuming and lack generalizability across various manufacturing features. To address these challenges, this study investigates vision-language models (VLMs) for automating the recognition of a wide range of manufacturing features in CAD designs without the need for extensive training datasets or predefined rules. Instead, prompt engineering techniques, such as multi-view query images, few-shot learning, sequential reasoning, and chain-of-thought, are applied to enable recognition. The approach is evaluated on a newly developed CAD dataset containing designs of varying complexity relevant to machining, additive manufacturing, sheet metal forming, molding, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsManufacturing Process and Optimization · 3D Surveying and Cultural Heritage · Image Processing and 3D Reconstruction

MethodsMasked autoencoder