Arrow-Guided VLM: Enhancing Flowchart Understanding via Arrow Direction Encoding
Takamitsu Omasa, Ryo Koshihara, and Masumi Morishige

TL;DR
This paper introduces a seven-stage pipeline that encodes arrow directions to improve vision-language model understanding of flowcharts, significantly increasing accuracy without fine-tuning.
Contribution
The novel arrow-aware pipeline explicitly encodes arrow directions and graph topology, enhancing flowchart comprehension by VLMs without task-specific training.
Findings
Accuracy improved from 80% to 89% on the benchmark.
Next-step query accuracy increased to 100%.
Method outperforms baseline models without fine-tuning.
Abstract
Flowcharts are indispensable tools in software design and business-process analysis, yet current vision-language models (VLMs) frequently misinterpret the directional arrows and graph topology that set these diagrams apart from natural images. We introduce a seven-stage pipeline grouped into three broader processes: (1) arrow-aware detection of nodes and arrow endpoints; (2) optical character recognition (OCR) to extract node text; and (3) construction of a structured prompt that guides the VLMs. Tested on a 90-question benchmark distilled from 30 annotated flowcharts, the method raises overall accuracy from 80 % to 89 % (+9 percentage points) without any task-specific fine-tuning. The gain is most pronounced for next-step queries (25/30 -> 30/30; 100 %, +17 pp); branch-result questions improve more modestly, and before-step questions remain difficult. A parallel evaluation with an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSparse Evolutionary Training
