Rethinking HTG Evaluation: Bridging Generation and Recognition
Konstantina Nikolaidou, George Retsinas, Giorgos Sfikas and, Marcus Liwicki

TL;DR
This paper introduces three new evaluation metrics for Handwriting Text Generation (HTG) that better assess quality by focusing on recognition accuracy, style, and diversity, addressing limitations of existing metrics.
Contribution
The authors propose three tailored measures for HTG evaluation based on recognition and writer identification, improving upon traditional metrics like FID.
Findings
The new metrics better capture handwriting diversity and utility.
Existing metrics like FID are inadequate for HTG evaluation.
The proposed measures provide a more comprehensive assessment of generated handwriting quality.
Abstract
The evaluation of generative models for natural image tasks has been extensively studied. Similar protocols and metrics are used in cases with unique particularities, such as Handwriting Generation, even if they might not be completely appropriate. In this work, we introduce three measures tailored for HTG evaluation, , , and , and argue that they are more expedient to evaluate the quality of generated handwritten images. The metrics rely on the recognition error/accuracy of Handwriting Text Recognition and Writer Identification models and emphasize writing style, textual content, and diversity as the main aspects that adhere to the content of handwritten images. We conduct comprehensive experiments on the IAM handwriting database, showcasing that widely used metrics such as FID fail to properly quantify…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHealth Systems, Economic Evaluations, Quality of Life · demographic modeling and climate adaptation
