Lexical Bias In Essay Level Prediction

Georgios Balikas

arXiv:1809.08935·cs.CL·September 25, 2018·1 cites

Lexical Bias In Essay Level Prediction

Georgios Balikas

PDF

Open Access

TL;DR

This paper introduces 'balikasg', a system that achieved state-of-the-art results in predicting non-native English essay levels, emphasizing feature engineering and model choices.

Contribution

It presents a novel system with detailed feature extraction and engineering strategies that outperform previous approaches in essay level prediction.

Findings

01

Achieved state-of-the-art performance in CAp 2018 challenge

02

Analyzed impact of feature engineering on accuracy

03

Provided insights for future improvements

Abstract

Automatically predicting the level of non-native English speakers given their written essays is an interesting machine learning problem. In this work I present the system "balikasg" that achieved the state-of-the-art performance in the CAp 2018 data science challenge among 14 systems. I detail the feature extraction, feature engineering and model selection steps and I evaluate how these decisions impact the system's performance. The paper concludes with remarks for future work.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification