Arrow-Guided VLM: Enhancing Flowchart Understanding via Arrow Direction Encoding

Takamitsu Omasa; Ryo Koshihara; and Masumi Morishige

arXiv:2505.07864·cs.AI·May 14, 2025

Arrow-Guided VLM: Enhancing Flowchart Understanding via Arrow Direction Encoding

Takamitsu Omasa, Ryo Koshihara, and Masumi Morishige

PDF

1 Repo 1 Datasets

TL;DR

This paper introduces a seven-stage pipeline that encodes arrow directions to improve vision-language model understanding of flowcharts, significantly increasing accuracy without fine-tuning.

Contribution

The novel arrow-aware pipeline explicitly encodes arrow directions and graph topology, enhancing flowchart comprehension by VLMs without task-specific training.

Findings

01

Accuracy improved from 80% to 89% on the benchmark.

02

Next-step query accuracy increased to 100%.

03

Method outperforms baseline models without fine-tuning.

Abstract

Flowcharts are indispensable tools in software design and business-process analysis, yet current vision-language models (VLMs) frequently misinterpret the directional arrows and graph topology that set these diagrams apart from natural images. We introduce a seven-stage pipeline grouped into three broader processes: (1) arrow-aware detection of nodes and arrow endpoints; (2) optical character recognition (OCR) to extract node text; and (3) construction of a structured prompt that guides the VLMs. Tested on a 90-question benchmark distilled from 30 annotated flowcharts, the method raises overall accuracy from 80 % to 89 % (+9 percentage points) without any task-specific fine-tuning. The gain is most pronounced for next-step queries (25/30 -> 30/30; 100 %, +17 pp); branch-result questions improve more modestly, and before-step questions remain difficult. A parallel evaluation with an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

galirage/Arrow-Guided-VLM-Enhancing-Flowchart-Understanding-via-Arrow-Direction-Encoding
noneOfficial

Datasets

galirage/FC-Detection
dataset· 67 dl
67 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSparse Evolutionary Training