An Online Reference-Free Evaluation Framework for Flowchart Image-to-Code Generation
Giang Son Nguyen, Zi Pong Lim, Sarthak Ketanbhai Modi, Yon Shin Teo, Wenya Wang

TL;DR
This paper introduces a reference-free, inference-time evaluation framework for flowchart image-to-code generation that uses OCR and Visual Entailment to assess quality without ground-truth references.
Contribution
It proposes two automated metrics, Recall OCR and Precision VE, combined into an F1 score, enabling reliable quality assessment in real-world, production environments.
Findings
High correlation with ground-truth metrics (r > 0.9)
Effective in continuous quality monitoring
Applicable to arbitrary flowchart inputs
Abstract
Vision-Language Models (VLMs) are increasingly used in document processing pipelines to convert flowchart images into structured code (e.g., Mermaid). In production, these systems process arbitrary inputs for which no ground-truth code exists, making output quality difficult to assess. We propose a reference-free evaluation framework that monitors flowchart image-to-code generation quality at inference time, using only the input image and the generated output. The framework introduces two automated metrics: , which estimates content coverage by extracting text from the input image via OCR as a proxy reference, and , which detects hallucinated elements through Visual Entailment against the original image. Their harmonic mean, , provides a unified quality score. Validation on the FlowVQA dataset shows strong…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Generative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection
