Contextualizing Variation in Text Style Transfer Datasets

Stephanie Schoch; Wanyu Du; Yangfeng Ji

arXiv:2108.07871·cs.CL·August 19, 2021

Contextualizing Variation in Text Style Transfer Datasets

Stephanie Schoch, Wanyu Du, Yangfeng Ji

PDF

TL;DR

This paper systematically analyzes existing text style transfer datasets, proposing a categorization of their stylistic and dataset properties to improve dataset selection and understanding in style transfer tasks.

Contribution

It provides a novel categorization framework for understanding and comparing text style datasets based on empirical analysis.

Findings

01

Identified key stylistic properties influencing dataset relationships

02

Proposed a categorization scheme for style and dataset properties

03

Enhanced understanding of dataset suitability for style transfer models

Abstract

Text style transfer involves rewriting the content of a source sentence in a target style. Despite there being a number of style tasks with available data, there has been limited systematic discussion of how text style datasets relate to each other. This understanding, however, is likely to have implications for selecting multiple data sources for model training. While it is prudent to consider inherent stylistic properties when determining these relationships, we also must consider how a style is realized in a particular dataset. In this paper, we conduct several empirical analyses of existing text style datasets. Based on our results, we propose a categorization of stylistic and dataset properties to consider when utilizing or comparing text style datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.