SiTSE: Sinhala Text Simplification Dataset and Evaluation
Surangika Ranathunga, Rumesh Sirithunga, Himashi Rathnayake, Lahiru De, Silva, and Thamindu Aluthwala, Saman Peramuna, Ravi Shekhar

TL;DR
This paper introduces a new Sinhala text simplification dataset, explores zero-shot and transfer learning methods with multilingual models, and demonstrates that intermediate task transfer learning improves simplification performance for low-resource languages.
Contribution
It provides the first manually curated Sinhala text simplification dataset and evaluates transfer learning techniques, highlighting challenges and proposing solutions for low-resource language processing.
Findings
ITTL outperforms zero-resource methods
Challenges in evaluating simplification systems
Need for improved evaluation metrics
Abstract
Text Simplification is a task that has been minimally explored for low-resource languages. Consequently, there are only a few manually curated datasets. In this paper, we present a human curated sentence-level text simplification dataset for the Sinhala language. Our evaluation dataset contains 1,000 complex sentences and corresponding 3,000 simplified sentences produced by three different human annotators. We model the text simplification task as a zero-shot and zero resource sequence-to-sequence (seq-seq) task on the multilingual language models mT5 and mBART. We exploit auxiliary data from related seq-seq tasks and explore the possibility of using intermediate task transfer learning (ITTL). Our analysis shows that ITTL outperforms the previously proposed zero-resource methods for text simplification. Our findings also highlight the challenges in evaluating text simplification…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Text Readability and Simplification · Translation Studies and Practices
MethodsAttention Is All You Need · Byte Pair Encoding · Dense Connections · Layer Normalization · Adafactor · Residual Connection · Attention Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Softmax
