Dear Sir or Madam, May I introduce the GYAFC Dataset: Corpus, Benchmarks and Metrics for Formality Style Transfer
Sudha Rao, Joel Tetreault

TL;DR
This paper introduces the GYAFC dataset for formality style transfer, providing a large corpus, benchmarks, and discussing challenges with automatic metrics to advance research in style transfer.
Contribution
It presents the largest corpus for formality style transfer, adapts machine translation techniques as baselines, and analyzes issues with automatic evaluation metrics.
Findings
Established strong baseline methods from machine translation
Created the largest formality transfer dataset to date
Highlighted challenges in automatic metric evaluation
Abstract
Style transfer is the task of automatically transforming a piece of text in one particular style into another. A major barrier to progress in this field has been a lack of training and evaluation datasets, as well as benchmarks and automatic metrics. In this work, we create the largest corpus for a particular stylistic transfer (formality) and show that techniques from the machine translation community can serve as strong baselines for future work. We also discuss challenges of using automatic metrics.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Handwritten Text Recognition Techniques
