On the interaction of automatic evaluation and task framing in headline style transfer
Lorenzo De Mattei, Michele Cafagna, Huiyuan Lai, Felice Dell'Orletta,, Malvina Nissim, Albert Gatt

TL;DR
This paper proposes a classifier-based evaluation method for headline style transfer, demonstrating it aligns better with system differences than traditional metrics like BLEU and ROUGE.
Contribution
It introduces a classifier-based evaluation approach for style transfer tasks, addressing limitations of human and corpus-based assessments.
Findings
Classifier-based evaluation correlates better with system quality.
Traditional metrics like BLEU and ROUGE are less effective for subtle style differences.
The method improves the reliability of evaluating style transfer systems.
Abstract
An ongoing debate in the NLG community concerns the best way to evaluate systems, with human evaluation often being considered the most reliable method, compared to corpus-based metrics. However, tasks involving subtle textual differences, such as style transfer, tend to be hard for humans to perform. In this paper, we propose an evaluation method for this task based on purposely-trained classifiers, showing that it better reflects system differences than traditional metrics such as BLEU and ROUGE.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Multimodal Machine Learning Applications · Video Analysis and Summarization
