Procedural Knowledge Extraction from Industrial Troubleshooting Guides Using Vision Language Models
Guillermo Gil de Avalle, Laura Maruster, Christos Emmanouilidis

TL;DR
This paper evaluates vision language models for automating the extraction of structured troubleshooting knowledge from industrial guides, highlighting model trade-offs and prompting strategies for practical use.
Contribution
It introduces a comparative analysis of VLMs and prompting strategies for extracting structured knowledge from complex industrial troubleshooting diagrams.
Findings
Model-specific trade-offs between layout sensitivity and semantic robustness.
Augmented prompting improves extraction accuracy for troubleshooting layouts.
Standard prompts offer better semantic understanding in some cases.
Abstract
Industrial troubleshooting guides encode diagnostic procedures in flowchart-like diagrams where spatial layout and technical language jointly convey meaning. To integrate this knowledge into operator support systems, which assist shop-floor personnel in diagnosing and resolving equipment issues, the information must first be extracted and structured for machine interpretation. However, when performed manually, this extraction is labor-intensive and error-prone. Vision Language Models offer potential to automate this process by jointly interpreting visual and textual meaning, yet their performance on such guides remains underexplored. This paper evaluates two VLMs on extracting structured knowledge, comparing two prompting strategies: standard instruction-guided versus an augmented approach that cues troubleshooting layout patterns. Results reveal model-specific trade-offs between layout…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Data Visualization and Analytics · BIM and Construction Integration
