Automated essay scoring with string kernels and word embeddings

M\u{a}d\u{a}lina Cozma; Andrei M. Butnaru; Radu Tudor Ionescu

arXiv:1804.07954·cs.CL·July 9, 2018

Automated essay scoring with string kernels and word embeddings

M\u{a}d\u{a}lina Cozma, Andrei M. Butnaru, Radu Tudor Ionescu

PDF

TL;DR

This paper introduces a novel method for automatic essay scoring that combines string kernels and word embeddings, achieving state-of-the-art results and surpassing deep learning approaches.

Contribution

First to apply string kernels to essay scoring and to combine them with semantic word embeddings for improved accuracy.

Findings

01

Achieved best performance on the Automated Student Assessment Prize dataset.

02

Outperformed recent deep learning methods in both in-domain and cross-domain evaluations.

03

Demonstrated effectiveness of combining low-level string features with high-level semantic features.

Abstract

In this work, we present an approach based on combining string kernels and word embeddings for automatic essay scoring. String kernels capture the similarity among strings based on counting common character n-grams, which are a low-level yet powerful type of feature, demonstrating state-of-the-art results in various text classification tasks such as Arabic dialect identification or native language identification. To our best knowledge, we are the first to apply string kernels to automatically score essays. We are also the first to combine them with a high-level semantic feature representation, namely the bag-of-super-word-embeddings. We report the best performance on the Automated Student Assessment Prize data set, in both in-domain and cross-domain settings, surpassing recent state-of-the-art deep learning approaches.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.