Evaluating Style Transfer for Text

Remi Mir; Bjarke Felbo; Nick Obradovich; Iyad Rahwan

arXiv:1904.02295·cs.CL·April 5, 2019·6 cites

Evaluating Style Transfer for Text

Remi Mir, Bjarke Felbo, Nick Obradovich, Iyad Rahwan

PDF

Open Access 1 Repo

TL;DR

This paper addresses the lack of standard evaluation methods in text style transfer, proposing improved metrics and best practices validated on a Yelp dataset to enhance reliability and comparability.

Contribution

It introduces new automated evaluation metrics for style transfer, correlates them with human judgment, and provides guidelines for assessing tradeoffs between style transfer aspects.

Findings

01

Automated metrics correlate better with human judgments.

02

Models show tradeoffs between style transfer quality and content preservation.

03

Software tools for evaluation are publicly released.

Abstract

Research in the area of style transfer for text is currently bottlenecked by a lack of standard evaluation practices. This paper aims to alleviate this issue by experimentally identifying best practices with a Yelp sentiment dataset. We specify three aspects of interest (style transfer intensity, content preservation, and naturalness) and show how to obtain more reliable measures of them from human evaluation than in previous work. We propose a set of metrics for automated evaluation and demonstrate that they are more strongly correlated and in agreement with human judgment: direction-corrected Earth Mover's Distance, Word Mover's Distance on style-masked texts, and adversarial classification for the respective aspects. We also show that the three examined models exhibit tradeoffs between aspects of interest, demonstrating the importance of evaluating style transfer models at specific…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

passeul/style-transfer-model-evaluation
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Sentiment Analysis and Opinion Mining