Minimal-Edit Instruction Tuning for Low-Resource Indic GEC

Akhil Rajeev P

arXiv:2512.00219·cs.CL·December 2, 2025

Minimal-Edit Instruction Tuning for Low-Resource Indic GEC

Akhil Rajeev P

PDF

Open Access

TL;DR

This paper introduces a minimal-edit instruction tuning approach for low-resource Indic grammatical error correction, leveraging instruction-tuned large language models, conservative decoding, and classifier-informed prompts to achieve competitive results.

Contribution

It presents a novel augmentation-free, instruction-tuning method with deterministic decoding and classifier-based prompts for Indic GEC, improving efficiency and reproducibility.

Findings

01

Achieved 92.41 GLEU on Malayalam, sixth overall.

02

Achieved 81.44 GLEU on Hindi, third overall.

03

Demonstrated effectiveness of classifier-informed prompts and conservative decoding.

Abstract

Grammatical error correction for Indic languages faces limited supervision, diverse scripts, and rich morphology. We propose an augmentation-free setup that uses instruction-tuned large language models and conservative decoding. A 12B GEMMA 3 model is instruction-tuned in bnb 4-bit precision with parameter-efficient fine-tuning (PEFT) and Alpaca-style formatting. Decoding follows a deterministic, constraint-aware procedure with a lightweight normaliser that encourages minimal, meaning-preserving edits. We operationalise inference, subsequent to instruction fine-tuning (IFT), via a fixed, language-specific prompt directly synthesised from a deterministic error classifier's taxonomy, label distributions, and precedence ordering computed on the training corpus. Under the official untuned GLEU evaluation, the system scores 92.41 on Malayalam, sixth overall, and 81.44 on Hindi, third…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Machine Learning and Algorithms