A Comparative Study of Controlled Text Generation Systems Using Level-Playing-Field Evaluation Principles
Michela Lorandi, Anya Belz

TL;DR
This paper proposes a standardized evaluation framework for controlled text generation systems, revealing significant discrepancies in reported performances and emphasizing the need for fair, reproducible assessment methods.
Contribution
It introduces a level-playing-field evaluation approach that standardizes outputs and uses shared datasets and metrics for fair comparison of CTG systems.
Findings
Re-evaluation shows many systems perform worse than originally reported.
Standardized evaluation reveals substantial discrepancies in system performance.
Highlights the necessity for reproducible and standardized assessment practices.
Abstract
Background: Many different approaches to controlled text generation (CTG) have been proposed over recent years, but it is difficult to get a clear picture of which approach performs best, because different datasets and evaluation methods are used in each case to assess the control achieved. Objectives: Our aim in the work reported in this paper is to develop an approach to evaluation that enables us to comparatively evaluate different CTG systems in a manner that is both informative and fair to the individual systems. Methods: We use a level-playing-field (LPF) approach to comparative evaluation where we (i) generate and process all system outputs in a standardised way, and (ii) apply a shared set of evaluation methods and datasets, selected based on those currently in use, in order to ensure fair evaluation. Results: When re-evaluated in this way, performance results for a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
