GrammarTagger: A Multilingual, Minimally-Supervised Grammar Profiler for Language Education
Masato Hagiwara, Joshua Tanner, Keisuke Sakaguchi

TL;DR
GrammarTagger is a multilingual, minimally-supervised grammar profiler that identifies grammatical features in texts to aid language education, supporting easy annotation and overlapping spans, with promising initial results in English and Chinese.
Contribution
It introduces a novel, low-resource approach for grammar profiling that learns from small annotated datasets and supports multilingual applications.
Findings
Achieved approximately 0.6 F1 score with limited data in English and Chinese.
Supports overlapping spans and intuitive annotation, reducing error propagation.
Enabled development of a language learning materials search engine.
Abstract
We present GrammarTagger, an open-source grammar profiler which, given an input text, identifies grammatical features useful for language education. The model architecture enables it to learn from a small amount of texts annotated with spans and their labels, which 1) enables easier and more intuitive annotation, 2) supports overlapping spans, and 3) is less prone to error propagation, compared to complex hand-crafted rules defined on constituency/dependency parses. We show that we can bootstrap a grammar profiler model with from only a couple hundred sentences both in English and Chinese, which can be further boosted via learning a multilingual model. With GrammarTagger, we also build Octanove Learn, a search engine of language learning materials indexed by their reading difficulty and grammatical features. The code and pretrained models are publicly available at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Text Readability and Simplification · Topic Modeling
