Lexical Bias In Essay Level Prediction
Georgios Balikas

TL;DR
This paper introduces 'balikasg', a system that achieved state-of-the-art results in predicting non-native English essay levels, emphasizing feature engineering and model choices.
Contribution
It presents a novel system with detailed feature extraction and engineering strategies that outperform previous approaches in essay level prediction.
Findings
Achieved state-of-the-art performance in CAp 2018 challenge
Analyzed impact of feature engineering on accuracy
Provided insights for future improvements
Abstract
Automatically predicting the level of non-native English speakers given their written essays is an interesting machine learning problem. In this work I present the system "balikasg" that achieved the state-of-the-art performance in the CAp 2018 data science challenge among 14 systems. I detail the feature extraction, feature engineering and model selection steps and I evaluate how these decisions impact the system's performance. The paper concludes with remarks for future work.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
