Classifying German Language Proficiency Levels Using Large Language Models

Elias-Leander Ahlers; Witold Brunsmann; and Malte Schilling

arXiv:2512.06483·cs.CL·December 9, 2025

Classifying German Language Proficiency Levels Using Large Language Models

Elias-Leander Ahlers, Witold Brunsmann, and Malte Schilling

PDF

Open Access 1 Datasets

TL;DR

This study explores the application of Large Language Models to automatically classify German texts into CEFR proficiency levels, demonstrating improved accuracy and scalability over previous methods.

Contribution

It introduces a diverse dataset combining real and synthetic data and evaluates multiple LLM-based approaches for CEFR classification, showing their effectiveness.

Findings

01

Prompt-engineering enhances classification accuracy.

02

Fine-tuning LLaMA-3-8B-Instruct improves performance.

03

Probing internal neural states offers reliable classification.

Abstract

Assessing language proficiency is essential for education, as it enables instruction tailored to learners needs. This paper investigates the use of Large Language Models (LLMs) for automatically classifying German texts according to the Common European Framework of Reference for Languages (CEFR) into different proficiency levels. To support robust training and evaluation, we construct a diverse dataset by combining multiple existing CEFR-annotated corpora with synthetic data. We then evaluate prompt-engineering strategies, fine-tuning of a LLaMA-3-8B-Instruct model and a probing-based approach that utilizes the internal neural state of the LLM for classification. Our results show a consistent performance improvement over prior methods, highlighting the potential of LLMs for reliable and scalable CEFR classification.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

EliasAhl/german-cefr
dataset· 6 dl
6 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText Readability and Simplification · Second Language Acquisition and Learning · Natural Language Processing Techniques