FlowLearn: Evaluating Large Vision-Language Models on Flowchart Understanding
Huitong Pan, Qi Zhang, Cornelia Caragea, Eduard Dragut, Longin Jan, Latecki

TL;DR
This paper introduces the FlowLearn dataset to evaluate large vision-language models on flowchart understanding, revealing current limitations and setting the stage for future improvements in scientific visual comprehension.
Contribution
The paper presents a new dataset, FlowLearn, with detailed annotations for flowchart understanding and evaluates state-of-the-art LVLMs, highlighting their strengths and weaknesses.
Findings
GPT-4V achieved 58% accuracy in node counting on simulated flowcharts.
Claude achieved 83% accuracy in OCR tasks.
No single model excels across all flowchart understanding tasks.
Abstract
Flowcharts are graphical tools for representing complex concepts in concise visual representations. This paper introduces the FlowLearn dataset, a resource tailored to enhance the understanding of flowcharts. FlowLearn contains complex scientific flowcharts and simulated flowcharts. The scientific subset contains 3,858 flowcharts sourced from scientific literature and the simulated subset contains 10,000 flowcharts created using a customizable script. The dataset is enriched with annotations for visual components, OCR, Mermaid code representation, and VQA question-answer pairs. Despite the proven capabilities of Large Vision-Language Models (LVLMs) in various visual understanding tasks, their effectiveness in decoding flowcharts - a crucial element of scientific communication - has yet to be thoroughly investigated. The FlowLearn test set is crafted to assess the performance of LVLMs in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
MethodsSparse Evolutionary Training
