Fine-Tuning Vision-Language Model for Automated Engineering Drawing   Information Extraction

Muhammad Tayyab Khan; Lequn Chen; Ye Han Ng; Wenhe Feng; Nicholas Yew; Jin Tan; Seung Ki Moon

arXiv:2411.03707·cs.CV·November 7, 2024

Fine-Tuning Vision-Language Model for Automated Engineering Drawing Information Extraction

Muhammad Tayyab Khan, Lequn Chen, Ye Han Ng, Wenhe Feng, Nicholas Yew, Jin Tan, Seung Ki Moon

PDF

Open Access

TL;DR

This paper presents a fine-tuned open-source vision-language model, Florence-2, for automated extraction of GD&T information from engineering drawings, demonstrating significant improvements over closed-source models in accuracy and hallucination reduction.

Contribution

It introduces a domain-specific fine-tuning approach for Florence-2, a smaller open-source VLM, to efficiently extract GD&T data from engineering drawings, outperforming larger closed-source models.

Findings

01

Florence-2 achieved a 29.95% increase in precision.

02

F1-score improved by 52.40%.

03

Hallucination rate was reduced by 43.15%.

Abstract

Geometric Dimensioning and Tolerancing (GD&T) plays a critical role in manufacturing by defining acceptable variations in part features to ensure component quality and functionality. However, extracting GD&T information from 2D engineering drawings is a time-consuming and labor-intensive task, often relying on manual efforts or semi-automated tools. To address these challenges, this study proposes an automated and computationally efficient GD&T extraction method by fine-tuning Florence-2, an open-source vision-language model (VLM). The model is trained on a dataset of 400 drawings with ground truth annotations provided by domain experts. For comparison, two state-of-the-art closed-source VLMs, GPT-4o and Claude-3.5-Sonnet, are evaluated on the same dataset. All models are assessed using precision, recall, F1-score, and hallucination metrics. Due to the computational cost and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBIM and Construction Integration · 3D Surveying and Cultural Heritage · Manufacturing Process and Optimization