Are pre-trained text representations useful for multilingual and   multi-dimensional language proficiency modeling?

Taraka Rama; Sowmya Vajjala

arXiv:2102.12971·cs.CL·February 26, 2021

Are pre-trained text representations useful for multilingual and multi-dimensional language proficiency modeling?

Taraka Rama, Sowmya Vajjala

PDF

Open Access 1 Repo

TL;DR

This paper investigates the effectiveness of pre-trained multilingual embeddings in modeling multiple dimensions of language proficiency across German, Italian, and Czech, revealing that fine-tuned embeddings improve performance but do not excel uniformly across all proficiency aspects.

Contribution

It introduces a multi-dimensional, multilingual proficiency classification approach using pre-trained embeddings, highlighting their variable effectiveness across different proficiency dimensions.

Findings

01

Fine-tuned embeddings enhance multilingual proficiency classification.

02

No single feature outperforms across all proficiency dimensions.

03

The study covers three languages and seven proficiency dimensions.

Abstract

Development of language proficiency models for non-native learners has been an active area of interest in NLP research for the past few years. Although language proficiency is multidimensional in nature, existing research typically considers a single "overall proficiency" while building models. Further, existing approaches also considers only one language at a time. This paper describes our experiments and observations about the role of pre-trained and fine-tuned multilingual embeddings in performing multi-dimensional, multilingual language proficiency classification. We report experiments with three languages -- German, Italian, and Czech -- and model seven dimensions of proficiency ranging from vocabulary control to sociolinguistic appropriateness. Our results indicate that while fine-tuned embeddings are useful for multilingual proficiency modeling, none of the features achieve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nishkalavallabhi/MultidimCEFRScoring
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification