From Labels to Facets: Building a Taxonomically Enriched Turkish Learner Corpus

Elif Sayar; Tolgahan T\"urker; Anna Golynskaia Knezhevich; Bihter Dereli; Ay\c{s}e Demirhas; Lionel Nicolas; G\"ul\c{s}en Eryi\u{g}it

arXiv:2601.22875·cs.CL·February 4, 2026

From Labels to Facets: Building a Taxonomically Enriched Turkish Learner Corpus

Elif Sayar, Tolgahan T\"urker, Anna Golynskaia Knezhevich, Bihter Dereli, Ay\c{s}e Demirhas, Lionel Nicolas, G\"ul\c{s}en Eryi\u{g}it

PDF

Open Access

TL;DR

This paper introduces a semi-automated, taxonomically enriched annotation methodology for Turkish learner corpora, enabling detailed, multi-dimensional linguistic error analysis and improving querying and research capabilities.

Contribution

It presents the first Turkish learner corpus annotated with a multi-dimensional taxonomy and an annotation extension tool that enhances flat annotations with rich linguistic facets.

Findings

01

Facet-level accuracy of 95.86% achieved

02

Enables detailed error pattern analysis across linguistic dimensions

03

Supports advanced querying and exploratory research

Abstract

In terms of annotation structure, most learner corpora rely on holistic flat label inventories which, even when extensive, do not explicitly separate multiple linguistic dimensions. This makes linguistically deep annotation difficult and complicates fine-grained analyses aimed at understanding why and how learners produce specific errors. To address these limitations, this paper presents a semi-automated annotation methodology for learner corpora, built upon a recently proposed faceted taxonomy, and implemented through a novel annotation extension framework. The taxonomy provides a theoretically grounded, multi-dimensional categorization that captures the linguistic properties underlying each error instance, thereby enabling standardized, fine-grained, and interpretable enrichment beyond flat annotations. The annotation extension tool, implemented based on the proposed extension…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSecond Language Acquisition and Learning · Natural Language Processing Techniques · Text Readability and Simplification