GiesKaNe: Bridging Past and Present in Grammatical Theory and Practical Application
Volker Emmrich

TL;DR
The paper presents GiesKaNe, a deeply annotated historical German corpus that balances standardization and innovation through human-machine collaboration, and introduces a novel text classification method.
Contribution
It introduces a new machine-assisted classification method for texts and a workflow for corpus compilation that uses existing infrastructure without specialized tools.
Findings
Effective integration of human expertise and machine processes.
A novel method for classifying texts along the orality-literacy continuum.
Demonstrates feasibility of ambitious corpus projects with simple tools.
Abstract
This article explores the requirements for corpus compilation within the GiesKaNe project (University of Giessen and Kassel, Syntactic Basic Structures of New High German). The project is defined by three central characteristics: it is a reference corpus, a historical corpus, and a syntactically deeply annotated treebank. As a historical corpus, GiesKaNe aims to establish connections with both historical and contemporary corpora, ensuring its relevance across temporal and linguistic contexts. The compilation process strikes the balance between innovation and adherence to standards, addressing both internal project goals and the broader interests of the research community. The methodological complexity of such a project is managed through a complementary interplay of human expertise and machine-assisted processes. The article discusses foundational topics such as tokenization,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHistorical Linguistics and Language Studies · Linguistic research and analysis · Linguistic Studies and Language Acquisition
