TL;DR
This paper investigates universal CEFR language proficiency classification across multiple languages using various feature types and models, demonstrating comparable performance in monolingual and multilingual settings with some challenges in cross-lingual transfer.
Contribution
It introduces a universal approach to CEFR classification employing both domain-specific and domain-agnostic features across multiple languages, including monolingual, cross-lingual, and multilingual models.
Findings
Monolingual and multilingual models achieve similar performance.
Cross-lingual classification results are lower but comparable to monolingual.
Universal features can be effective across different languages.
Abstract
The Common European Framework of Reference (CEFR) guidelines describe language proficiency of learners on a scale of 6 levels. While the description of CEFR guidelines is generic across languages, the development of automated proficiency classification systems for different languages follow different approaches. In this paper, we explore universal CEFR classification using domain-specific and domain-agnostic, theory-guided as well as data-driven features. We report the results of our preliminary experiments in monolingual, cross-lingual, and multilingual classification with three languages: German, Czech, and Italian. Our results show that both monolingual and multilingual models achieve similar performance, and cross-lingual classification yields lower, but comparable results to monolingual classification.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
