A Review of Human Evaluation for Style Transfer

Eleftheria Briakou; Sweta Agrawal; Ke Zhang; Joel Tetreault; Marine; Carpuat

arXiv:2106.04747·cs.CL·June 10, 2021

A Review of Human Evaluation for Style Transfer

Eleftheria Briakou, Sweta Agrawal, Ke Zhang, Joel Tetreault, Marine, Carpuat

PDF

1 Repo

TL;DR

This paper reviews 97 style transfer studies focusing on human evaluation methods, highlighting issues with lack of standardization and reproducibility that hinder progress in the field.

Contribution

It provides a comprehensive summary of current human evaluation practices in style transfer research and discusses challenges in standardization and reproducibility.

Findings

01

Human evaluation protocols are often underspecified.

02

Lack of standardization hampers reproducibility.

03

Improving evaluation methods can advance the field.

Abstract

This paper reviews and summarizes human evaluation practices described in 97 style transfer papers with respect to three main evaluation aspects: style transfer, meaning preservation, and fluency. In principle, evaluations by human raters should be the most reliable. However, in style transfer papers, we find that protocols for human evaluations are often underspecified and not standardized, which hampers the reproducibility of research in this field and progress toward better human and automatic evaluation methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Elbria/ST-human-review
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.