Automated Parsing of Engineering Drawings for Structured Information Extraction Using a Fine-tuned Document Understanding Transformer

Muhammad Tayyab Khan; Zane Yong; Lequn Chen; Jun Ming Tan; Wenhe Feng; and Seung Ki Moon

arXiv:2505.01530·cs.CV·September 4, 2025

Automated Parsing of Engineering Drawings for Structured Information Extraction Using a Fine-tuned Document Understanding Transformer

Muhammad Tayyab Khan, Zane Yong, Lequn Chen, Jun Ming Tan, Wenhe Feng, and Seung Ki Moon

PDF

Open Access

TL;DR

This paper introduces a hybrid deep learning framework combining OBB detection and transformer-based parsing to accurately extract structured information from complex engineering drawings, significantly improving automation and precision.

Contribution

It presents a novel integrated approach using YOLOv11 and Donut models, with a single model outperforming category-specific models for extracting detailed drawing information.

Findings

01

Achieved 94.77% precision in GD&T detection

02

Attained 100% recall in most categories

03

F1 score reached 97.3%, reducing hallucinations to 5.23%

Abstract

Accurate extraction of key information from 2D engineering drawings is crucial for high-precision manufacturing. Manual extraction is slow and labor-intensive, while traditional Optical Character Recognition (OCR) techniques often struggle with complex layouts and overlapping symbols, resulting in unstructured outputs. To address these challenges, this paper proposes a novel hybrid deep learning framework for structured information extraction by integrating an Oriented Bounding Box (OBB) detection model with a transformer-based document parsing model (Donut). An in-house annotated dataset is used to train YOLOv11 for detecting nine key categories: Geometric Dimensioning and Tolerancing (GD&T), General Tolerances, Measures, Materials, Notes, Radii, Surface Roughness, Threads, and Title Blocks. Detected OBBs are cropped into images and labeled to fine-tune Donut for structured JSON…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Image Processing and 3D Reconstruction