Native Language Identification on Text and Speech

Marcos Zampieri; Alina Maria Ciobanu; Liviu P. Dinu

arXiv:1707.07182·cs.CL·July 25, 2017

Native Language Identification on Text and Speech

Marcos Zampieri, Alina Maria Ciobanu, Liviu P. Dinu

PDF

TL;DR

This paper introduces an ensemble SVM-based system for native language identification using text and speech data, achieving high accuracy and ranking third in a shared task competition.

Contribution

It presents a novel ensemble approach combining multiple SVM classifiers trained on character n-grams for native language identification from both text and speech.

Findings

01

Achieved 83.58% accuracy in the NLI shared task

02

Ranked 3rd among participating teams

03

Effective use of character n-grams for classification

Abstract

This paper presents an ensemble system combining the output of multiple SVM classifiers to native language identification (NLI). The system was submitted to the NLI Shared Task 2017 fusion track which featured students essays and spoken responses in form of audio transcriptions and iVectors by non-native English speakers of eleven native languages. Our system competed in the challenge under the team name ZCD and was based on an ensemble of SVM classifiers trained on character n-grams achieving 83.58% accuracy and ranking 3rd in the shared task.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.