A Comparative Study of Controlled Text Generation Systems Using Level-Playing-Field Evaluation Principles

Michela Lorandi; Anya Belz

arXiv:2605.12395·cs.CL·May 13, 2026

A Comparative Study of Controlled Text Generation Systems Using Level-Playing-Field Evaluation Principles

Michela Lorandi, Anya Belz

PDF

TL;DR

This paper proposes a standardized evaluation framework for controlled text generation systems, revealing significant discrepancies in reported performances and emphasizing the need for fair, reproducible assessment methods.

Contribution

It introduces a level-playing-field evaluation approach that standardizes outputs and uses shared datasets and metrics for fair comparison of CTG systems.

Findings

01

Re-evaluation shows many systems perform worse than originally reported.

02

Standardized evaluation reveals substantial discrepancies in system performance.

03

Highlights the necessity for reproducible and standardized assessment practices.

Abstract

Background: Many different approaches to controlled text generation (CTG) have been proposed over recent years, but it is difficult to get a clear picture of which approach performs best, because different datasets and evaluation methods are used in each case to assess the control achieved. Objectives: Our aim in the work reported in this paper is to develop an approach to evaluation that enables us to comparatively evaluate different CTG systems in a manner that is both informative and fair to the individual systems. Methods: We use a level-playing-field (LPF) approach to comparative evaluation where we (i) generate and process all system outputs in a standardised way, and (ii) apply a shared set of evaluation methods and datasets, selected based on those currently in use, in order to ensure fair evaluation. Results: When re-evaluated in this way, performance results for a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.