ViRED: Prediction of Visual Relations in Engineering Drawings

Chao Gu; Ke Lin; Yiyang Luo; Jiahui Hou; Xiang-Yang Li

arXiv:2409.00909·cs.CV·September 4, 2024

ViRED: Prediction of Visual Relations in Engineering Drawings

Chao Gu, Ke Lin, Yiyang Luo, Jiahui Hou, Xiang-Yang Li

PDF

Open Access

TL;DR

ViRED is a vision-based model designed to accurately predict relationships between tables and circuits in engineering drawings, outperforming existing methods with high accuracy and fast inference.

Contribution

The paper introduces ViRED, a novel vision-based relation detection model specifically tailored for engineering drawings, addressing limitations of text-focused and existing visual relation detection approaches.

Findings

01

Achieved 96% accuracy in relation prediction on engineering drawings.

02

Demonstrated fast inference speed with multiple objects.

03

Outperformed existing relation detection methods.

Abstract

To accurately understand engineering drawings, it is essential to establish the correspondence between images and their description tables within the drawings. Existing document understanding methods predominantly focus on text as the main modality, which is not suitable for documents containing substantial image information. In the field of visual relation detection, the structure of the task inherently limits its capacity to assess relationships among all entity pairs in the drawings. To address this issue, we propose a vision-based relation detection model, named ViRED, to identify the associations between tables and circuits in electrical engineering drawings. Our model mainly consists of three parts: a vision encoder, an object encoder, and a relation decoder. We implement ViRED using PyTorch to evaluate its performance. To validate the efficacy of ViRED, we conduct a series of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Visualization and Analytics

MethodsFocus · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings