Learning to Paraphrase Sentences to Different Complexity Levels
Alison Chi, Li-Kuang Chen, Yi-Chen Chang, Shu-Hui Lee, Jason S. Chang

TL;DR
This paper introduces unsupervised datasets for sentence simplification, complexification, and paraphrasing, demonstrating state-of-the-art results and analyzing large language models' zero-shot capabilities.
Contribution
It presents two novel unsupervised datasets labeled by different methods and evaluates their effectiveness across multiple strategies and models.
Findings
Models trained on weak classifier labeled data achieve state-of-the-art on ASSET.
Unsupervised datasets outperform previous methods in sentence simplification.
Large Language Models show promising zero-shot performance on these tasks.
Abstract
While sentence simplification is an active research topic in NLP, its adjacent tasks of sentence complexification and same-level paraphrasing are not. To train models on all three tasks, we present two new unsupervised datasets. We compare these datasets, one labeled by a weak classifier and the other by a rule-based approach, with a single supervised dataset. Using these three datasets for training, we perform extensive experiments on both multitasking and prompting strategies. Compared to other systems trained on unsupervised parallel data, models trained on our weak classifier labeled dataset achieve state-of-the-art performance on the ASSET simplification benchmark. Our models also outperform previous work on sentence level targeting. Finally, we establish how a handful of Large Language Models perform on these tasks under a zero-shot setting.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification · Natural Language Processing Techniques · Topic Modeling
