Minimal-Edit Instruction Tuning for Low-Resource Indic GEC
Akhil Rajeev P

TL;DR
This paper introduces a minimal-edit instruction tuning approach for low-resource Indic grammatical error correction, leveraging instruction-tuned large language models, conservative decoding, and classifier-informed prompts to achieve competitive results.
Contribution
It presents a novel augmentation-free, instruction-tuning method with deterministic decoding and classifier-based prompts for Indic GEC, improving efficiency and reproducibility.
Findings
Achieved 92.41 GLEU on Malayalam, sixth overall.
Achieved 81.44 GLEU on Hindi, third overall.
Demonstrated effectiveness of classifier-informed prompts and conservative decoding.
Abstract
Grammatical error correction for Indic languages faces limited supervision, diverse scripts, and rich morphology. We propose an augmentation-free setup that uses instruction-tuned large language models and conservative decoding. A 12B GEMMA 3 model is instruction-tuned in bnb 4-bit precision with parameter-efficient fine-tuning (PEFT) and Alpaca-style formatting. Decoding follows a deterministic, constraint-aware procedure with a lightweight normaliser that encourages minimal, meaning-preserving edits. We operationalise inference, subsequent to instruction fine-tuning (IFT), via a fixed, language-specific prompt directly synthesised from a deterministic error classifier's taxonomy, label distributions, and precedence ordering computed on the training corpus. Under the official untuned GLEU evaluation, the system scores 92.41 on Malayalam, sixth overall, and 81.44 on Hindi, third…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Machine Learning and Algorithms
