Text Style Transfer Evaluation Using Large Language Models
Phil Ostheimer, Mayank Nagda, Marius Kloft, Sophie Fellenz

TL;DR
This paper investigates the use of Large Language Models for evaluating Text Style Transfer, showing they correlate well with human judgments and outperform traditional metrics, especially with prompt ensembling.
Contribution
It demonstrates that LLMs can effectively evaluate TST, with strong correlation to human assessments, and introduces prompt ensembling to improve evaluation robustness.
Findings
LLMs often outperform traditional automated metrics in TST evaluation.
Zero-shot prompting with LLMs correlates strongly with human evaluations.
Prompt ensembling enhances the robustness of LLM-based TST evaluation.
Abstract
Evaluating Text Style Transfer (TST) is a complex task due to its multifaceted nature. The quality of the generated text is measured based on challenging factors, such as style transfer accuracy, content preservation, and overall fluency. While human evaluation is considered to be the gold standard in TST assessment, it is costly and often hard to reproduce. Therefore, automated metrics are prevalent in these domains. Nevertheless, it remains unclear whether these automated metrics correlate with human evaluations. Recent strides in Large Language Models (LLMs) have showcased their capacity to match and even exceed average human performance across diverse, unseen tasks. This suggests that LLMs could be a feasible alternative to human evaluation and other automated metrics in TST evaluation. We compare the results of different LLMs in TST using multiple input prompts. Our findings…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
