From Labels to Facets: Building a Taxonomically Enriched Turkish Learner Corpus
Elif Sayar, Tolgahan T\"urker, Anna Golynskaia Knezhevich, Bihter Dereli, Ay\c{s}e Demirhas, Lionel Nicolas, G\"ul\c{s}en Eryi\u{g}it

TL;DR
This paper introduces a semi-automated, taxonomically enriched annotation methodology for Turkish learner corpora, enabling detailed, multi-dimensional linguistic error analysis and improving querying and research capabilities.
Contribution
It presents the first Turkish learner corpus annotated with a multi-dimensional taxonomy and an annotation extension tool that enhances flat annotations with rich linguistic facets.
Findings
Facet-level accuracy of 95.86% achieved
Enables detailed error pattern analysis across linguistic dimensions
Supports advanced querying and exploratory research
Abstract
In terms of annotation structure, most learner corpora rely on holistic flat label inventories which, even when extensive, do not explicitly separate multiple linguistic dimensions. This makes linguistically deep annotation difficult and complicates fine-grained analyses aimed at understanding why and how learners produce specific errors. To address these limitations, this paper presents a semi-automated annotation methodology for learner corpora, built upon a recently proposed faceted taxonomy, and implemented through a novel annotation extension framework. The taxonomy provides a theoretically grounded, multi-dimensional categorization that captures the linguistic properties underlying each error instance, thereby enabling standardized, fine-grained, and interpretable enrichment beyond flat annotations. The annotation extension tool, implemented based on the proposed extension…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSecond Language Acquisition and Learning · Natural Language Processing Techniques · Text Readability and Simplification
