Skill-Aligned Annotation for Reliable Evaluation in Text-to-Image Generation

Abdelrahman Eldesokey; Merey Ramazanova; Ahmad Sait; Ansar Khangeldin; Karen Sanchez; Tong Zhang; Bernard Ghanem

arXiv:2605.13223·cs.CV·May 14, 2026

Skill-Aligned Annotation for Reliable Evaluation in Text-to-Image Generation

Abdelrahman Eldesokey, Merey Ramazanova, Ahmad Sait, Ansar Khangeldin, Karen Sanchez, Tong Zhang, Bernard Ghanem

PDF

TL;DR

This paper proposes skill-aligned annotation strategies for more reliable and consistent evaluation of text-to-image models, demonstrating improved agreement and stability over traditional uniform methods.

Contribution

It introduces a skill-aligned annotation framework for T2I evaluation, showing it outperforms uniform annotation approaches and provides a scalable, automated evaluation pipeline.

Findings

01

Skill-aligned annotation yields higher inter-annotator agreement.

02

It improves evaluation stability across different models.

03

The automated pipeline enables scalable, fine-grained assessment.

Abstract

Text-to-image (T2I) generation has advanced rapidly, making reliable evaluation critical as performance differences between models narrow. Existing evaluation practices typically apply uniform annotation mechanisms, such as Likert-scale or binary question answering (BQA), across heterogeneous evaluation skills, despite fundamental differences in their nature. In this work, we revisit T2I evaluation through the lens of skill-aligned annotation, where annotation strategies reflect the underlying characteristics of each evaluation skill. We systematically compare skill-aligned annotation against uniform baselines and show that it produces more consistent evaluation signals, with higher inter-annotator agreement and improved stability across models. Finally, we present an automated pipeline that instantiates the proposed evaluation protocol, enabling scalable and fine-grained evaluation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.