EvalCards: A Framework for Standardized Evaluation Reporting

Ruchira Dhar; Danae Sanchez Villegas; Antonia Karamolegkou; Alice Schiavone; Yifei Yuan; Xinyi Chen; Jiaang Li; Stella Frank; Laura De Grazia; Monorama Swain; Stephanie Brandl; Daniel Hershcovich; Anders S{\o}gaard; Desmond Elliott

arXiv:2511.21695·cs.CL·December 1, 2025

EvalCards: A Framework for Standardized Evaluation Reporting

Ruchira Dhar, Danae Sanchez Villegas, Antonia Karamolegkou, Alice Schiavone, Yifei Yuan, Xinyi Chen, Jiaang Li, Stella Frank, Laura De Grazia, Monorama Swain, Stephanie Brandl, Daniel Hershcovich, Anders S{\o}gaard, Desmond Elliott

PDF

Open Access

TL;DR

This paper introduces EvalCards, a standardized framework aimed at improving transparency, reproducibility, and governance in NLP evaluation reporting through a practical disclosure tool.

Contribution

The paper proposes EvalCards, a new framework for evaluation reporting that addresses current shortcomings in reproducibility, accessibility, and governance in NLP.

Findings

01

EvalCards enhance transparency for researchers and practitioners.

02

They provide a practical foundation for governance compliance.

03

The framework addresses key shortcomings in current reporting practices.

Abstract

Evaluation has long been a central concern in NLP, and transparent reporting practices are more critical than ever in today's landscape of rapidly released open-access models. Drawing on a survey of recent work on evaluation and documentation, we identify three persistent shortcomings in current reporting practices: reproducibility, accessibility, and governance. We argue that existing standardization efforts remain insufficient and introduce Evaluation Disclosure Cards (EvalCards) as a path forward. EvalCards are designed to enhance transparency for both researchers and practitioners while providing a practical foundation to meet emerging governance requirements.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI · Expert finding and Q&A systems · Wikis in Education and Collaboration