CLaC @ QATS: Quality Assessment for Text Simplification

Elnaz Davoodi; Leila Kosseim

arXiv:1708.05797·cs.CL·August 22, 2017

CLaC @ QATS: Quality Assessment for Text Simplification

Elnaz Davoodi, Leila Kosseim

PDF

Open Access

TL;DR

This paper presents a machine learning approach using Random Forest classifiers to evaluate the quality of simplified texts across grammaticality, meaning preservation, and simplicity, for the QATS shared task.

Contribution

The authors developed a multi-faceted classification system employing novel features like language models and word embeddings for text quality assessment.

Findings

01

Accuracy of 58.73% for grammaticality

02

Overall accuracy of 33.33% for quality assessment

03

Utilized diverse features including TF-IDF and cue phrases

Abstract

This paper describes our approach to the 2016 QATS quality assessment shared task. We trained three independent Random Forest classifiers in order to assess the quality of the simplified texts in terms of grammaticality, meaning preservation and simplicity. We used the language model of Google-Ngram as feature to predict the grammaticality. Meaning preservation is predicted using two complementary approaches based on word embedding and WordNet synonyms. A wider range of features including TF-IDF, sentence length and frequency of cue phrases are used to evaluate the simplicity aspect. Overall, the accuracy of the system ranges from 33.33% for the overall aspect to 58.73% for grammaticality.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText Readability and Simplification · Natural Language Processing Techniques