Dear Sir or Madam, May I introduce the GYAFC Dataset: Corpus, Benchmarks   and Metrics for Formality Style Transfer

Sudha Rao; Joel Tetreault

arXiv:1803.06535·cs.CL·April 17, 2018·20 cites

Dear Sir or Madam, May I introduce the GYAFC Dataset: Corpus, Benchmarks and Metrics for Formality Style Transfer

Sudha Rao, Joel Tetreault

PDF

Open Access 1 Repo 3 Models

TL;DR

This paper introduces the GYAFC dataset for formality style transfer, providing a large corpus, benchmarks, and discussing challenges with automatic metrics to advance research in style transfer.

Contribution

It presents the largest corpus for formality style transfer, adapts machine translation techniques as baselines, and analyzes issues with automatic evaluation metrics.

Findings

01

Established strong baseline methods from machine translation

02

Created the largest formality transfer dataset to date

03

Highlighted challenges in automatic metric evaluation

Abstract

Style transfer is the task of automatically transforming a piece of text in one particular style into another. A major barrier to progress in this field has been a lack of training and evaluation datasets, as well as benchmarks and automatic metrics. In this work, we create the largest corpus for a particular stylistic transfer (formality) and show that techniques from the machine translation community can serve as strong baselines for future work. We also discuss challenges of using automatic metrics.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

raosudha89/GYAFC-corpus
noneOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Handwritten Text Recognition Techniques