ViRED: Prediction of Visual Relations in Engineering Drawings
Chao Gu, Ke Lin, Yiyang Luo, Jiahui Hou, Xiang-Yang Li

TL;DR
ViRED is a vision-based model designed to accurately predict relationships between tables and circuits in engineering drawings, outperforming existing methods with high accuracy and fast inference.
Contribution
The paper introduces ViRED, a novel vision-based relation detection model specifically tailored for engineering drawings, addressing limitations of text-focused and existing visual relation detection approaches.
Findings
Achieved 96% accuracy in relation prediction on engineering drawings.
Demonstrated fast inference speed with multiple objects.
Outperformed existing relation detection methods.
Abstract
To accurately understand engineering drawings, it is essential to establish the correspondence between images and their description tables within the drawings. Existing document understanding methods predominantly focus on text as the main modality, which is not suitable for documents containing substantial image information. In the field of visual relation detection, the structure of the task inherently limits its capacity to assess relationships among all entity pairs in the drawings. To address this issue, we propose a vision-based relation detection model, named ViRED, to identify the associations between tables and circuits in electrical engineering drawings. Our model mainly consists of three parts: a vision encoder, an object encoder, and a relation decoder. We implement ViRED using PyTorch to evaluate its performance. To validate the efficacy of ViRED, we conduct a series of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics
MethodsFocus · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
