An Online Reference-Free Evaluation Framework for Flowchart Image-to-Code Generation

Giang Son Nguyen; Zi Pong Lim; Sarthak Ketanbhai Modi; Yon Shin Teo; Wenya Wang

arXiv:2602.13376·cs.CV·February 17, 2026

An Online Reference-Free Evaluation Framework for Flowchart Image-to-Code Generation

Giang Son Nguyen, Zi Pong Lim, Sarthak Ketanbhai Modi, Yon Shin Teo, Wenya Wang

PDF

Open Access

TL;DR

This paper introduces a reference-free, inference-time evaluation framework for flowchart image-to-code generation that uses OCR and Visual Entailment to assess quality without ground-truth references.

Contribution

It proposes two automated metrics, Recall OCR and Precision VE, combined into an F1 score, enabling reliable quality assessment in real-world, production environments.

Findings

01

High correlation with ground-truth metrics (r > 0.9)

02

Effective in continuous quality monitoring

03

Applicable to arbitrary flowchart inputs

Abstract

Vision-Language Models (VLMs) are increasingly used in document processing pipelines to convert flowchart images into structured code (e.g., Mermaid). In production, these systems process arbitrary inputs for which no ground-truth code exists, making output quality difficult to assess. We propose a reference-free evaluation framework that monitors flowchart image-to-code generation quality at inference time, using only the input image and the generated output. The framework introduces two automated metrics: $Recall OCR$ , which estimates content coverage by extracting text from the input image via OCR as a proxy reference, and $Precision VE$ , which detects hallucinated elements through Visual Entailment against the original image. Their harmonic mean, $F1 OCR-VE$ , provides a unified quality score. Validation on the FlowVQA dataset shows strong…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Generative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection