Towards Arabic Sentence Simplification via Classification and Generative Approaches
Nouran Khallaf, Serge Sharoff

TL;DR
This paper explores Arabic sentence simplification using classification and generative methods, leveraging Arabic-BERT, fastText, and mT5 models, with promising results demonstrated through BERTScore evaluation and manual analysis.
Contribution
It introduces a novel Arabic sentence simplification system combining lexical and generative approaches, with a new aligned corpus from an Arabic novel.
Findings
The lexical approach with Arabic-BERT and fastText achieved high BERTScore (F-1 0.97).
The generative mT5 model achieved moderate BERTScore (F-1 0.70).
Manual error analysis provided insights into model performance.
Abstract
This paper presents an attempt to build a Modern Standard Arabic (MSA) sentence-level simplification system. We experimented with sentence simplification using two approaches: (i) a classification approach leading to lexical simplification pipelines which use Arabic-BERT, a pre-trained contextualised model, as well as a model of fastText word embeddings; and (ii) a generative approach, a Seq2Seq technique by applying a multilingual Text-to-Text Transfer Transformer mT5. We developed our training corpus by aligning the original and simplified sentences from the internationally acclaimed Arabic novel "Saaq al-Bambuu". We evaluate effectiveness of these methods by comparing the generated simple sentences to the target simple sentences using the BERTScore evaluation metric. The simple sentences produced by the mT5 model achieve P 0.72, R 0.68 and F-1 0.70 via BERTScore, while, combining…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification · Natural Language Processing Techniques · Topic Modeling
MethodsAttention Is All You Need · Linear Layer · Adafactor · Tanh Activation · Inverse Square Root Schedule · Gated Linear Unit · Sigmoid Activation · SentencePiece · Refunds@Expedia|||How do I get a full refund from Expedia? · Label Smoothing
