TL;DR
This paper presents a reinforcement learning approach to simplify complex sentences for ESL learners, enhancing vocabulary coverage and accessibility without requiring parallel corpora.
Contribution
It introduces a novel reinforcement learning method that optimizes sentence simplification for ESL learners by increasing vocabulary coverage and maintaining quality, without needing parallel datasets.
Findings
Vocabulary coverage increased by over 20%
Enhanced diversity of simplified sentences
Maintained high quality of simplifications
Abstract
Text simplification is crucial for improving accessibility and comprehension for English as a Second Language (ESL) learners. This study goes a step further and aims to facilitate ESL learners' language acquisition by simplification. Specifically, we propose simplifying complex sentences to appropriate levels for learners while also increasing vocabulary coverage of the target level in the simplifications. We achieve this without a parallel corpus by conducting reinforcement learning on a large language model. Our method employs token-level and sentence-level rewards, and iteratively trains the model on its self-generated outputs to guide the model to search for simplification hypotheses that satisfy the target attributes. Experiment results on CEFR-SP and TurkCorpus datasets show that the proposed method can effectively increase the frequency and diversity of vocabulary of the target…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
