(Male, Bachelor) and (Female, Ph.D) have different connotations: Parallelly Annotated Stylistic Language Dataset with Multiple Personas
Dongyeop Kang, Varun Gangal, Eduard Hovy

TL;DR
This paper introduces PASTEL, a large parallel stylistic language dataset with diverse personas, enabling systematic study and evaluation of stylistic variation in text, and demonstrates its utility in style prediction and transfer tasks.
Contribution
The paper presents PASTEL, a novel parallel and annotated stylistic language dataset with multiple personas, addressing the lack of controlled corpora for stylistic variation research.
Findings
PASTEL enables more accurate style prediction experiments.
Supervised models with PASTEL outperform unsupervised style transfer models.
The dataset facilitates controlled evaluation of stylistic language models.
Abstract
Stylistic variation in text needs to be studied with different aspects including the writer's personal traits, interpersonal relations, rhetoric, and more. Despite recent attempts on computational modeling of the variation, the lack of parallel corpora of style language makes it difficult to systematically control the stylistic change as well as evaluate such models. We release PASTEL, the parallel and annotated stylistic language dataset, that contains ~41K parallel sentences (8.3K parallel stories) annotated across different personas. Each persona has different styles in conjunction: gender, age, country, political view, education, ethnic, and time-of-writing. The dataset is collected from human annotators with solid control of input denotation: not only preserving original meaning between text, but promoting stylistic diversity to annotators. We test the dataset on two interesting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Authorship Attribution and Profiling
