Fine-Tuning Vision-Language Model for Automated Engineering Drawing Information Extraction
Muhammad Tayyab Khan, Lequn Chen, Ye Han Ng, Wenhe Feng, Nicholas Yew, Jin Tan, Seung Ki Moon

TL;DR
This paper presents a fine-tuned open-source vision-language model, Florence-2, for automated extraction of GD&T information from engineering drawings, demonstrating significant improvements over closed-source models in accuracy and hallucination reduction.
Contribution
It introduces a domain-specific fine-tuning approach for Florence-2, a smaller open-source VLM, to efficiently extract GD&T data from engineering drawings, outperforming larger closed-source models.
Findings
Florence-2 achieved a 29.95% increase in precision.
F1-score improved by 52.40%.
Hallucination rate was reduced by 43.15%.
Abstract
Geometric Dimensioning and Tolerancing (GD&T) plays a critical role in manufacturing by defining acceptable variations in part features to ensure component quality and functionality. However, extracting GD&T information from 2D engineering drawings is a time-consuming and labor-intensive task, often relying on manual efforts or semi-automated tools. To address these challenges, this study proposes an automated and computationally efficient GD&T extraction method by fine-tuning Florence-2, an open-source vision-language model (VLM). The model is trained on a dataset of 400 drawings with ground truth annotations provided by domain experts. For comparison, two state-of-the-art closed-source VLMs, GPT-4o and Claude-3.5-Sonnet, are evaluated on the same dataset. All models are assessed using precision, recall, F1-score, and hallucination metrics. Due to the computational cost and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBIM and Construction Integration · 3D Surveying and Cultural Heritage · Manufacturing Process and Optimization
