Contextualizing Variation in Text Style Transfer Datasets
Stephanie Schoch, Wanyu Du, Yangfeng Ji

TL;DR
This paper systematically analyzes existing text style transfer datasets, proposing a categorization of their stylistic and dataset properties to improve dataset selection and understanding in style transfer tasks.
Contribution
It provides a novel categorization framework for understanding and comparing text style datasets based on empirical analysis.
Findings
Identified key stylistic properties influencing dataset relationships
Proposed a categorization scheme for style and dataset properties
Enhanced understanding of dataset suitability for style transfer models
Abstract
Text style transfer involves rewriting the content of a source sentence in a target style. Despite there being a number of style tasks with available data, there has been limited systematic discussion of how text style datasets relate to each other. This understanding, however, is likely to have implications for selecting multiple data sources for model training. While it is prudent to consider inherent stylistic properties when determining these relationships, we also must consider how a style is realized in a particular dataset. In this paper, we conduct several empirical analyses of existing text style datasets. Based on our results, we propose a categorization of stylistic and dataset properties to consider when utilizing or comparing text style datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
