CLaC @ QATS: Quality Assessment for Text Simplification
Elnaz Davoodi, Leila Kosseim

TL;DR
This paper presents a machine learning approach using Random Forest classifiers to evaluate the quality of simplified texts across grammaticality, meaning preservation, and simplicity, for the QATS shared task.
Contribution
The authors developed a multi-faceted classification system employing novel features like language models and word embeddings for text quality assessment.
Findings
Accuracy of 58.73% for grammaticality
Overall accuracy of 33.33% for quality assessment
Utilized diverse features including TF-IDF and cue phrases
Abstract
This paper describes our approach to the 2016 QATS quality assessment shared task. We trained three independent Random Forest classifiers in order to assess the quality of the simplified texts in terms of grammaticality, meaning preservation and simplicity. We used the language model of Google-Ngram as feature to predict the grammaticality. Meaning preservation is predicted using two complementary approaches based on word embedding and WordNet synonyms. A wider range of features including TF-IDF, sentence length and frequency of cue phrases are used to evaluate the simplicity aspect. Overall, the accuracy of the system ranges from 33.33% for the overall aspect to 58.73% for grammaticality.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification · Natural Language Processing Techniques
