A Simple Yet Effective Corpus Construction Framework for Indonesian Grammatical Error Correction
Nankai Lin, Meiyu Zeng, Wentao Huang, Shengyi Jiang, Lixian Xiao,, Aimin Yang

TL;DR
This paper introduces a framework for constructing high-quality grammatical error correction (GEC) evaluation corpora for Indonesian, a low-resource language, and explores using large language models to streamline annotation efforts, demonstrating promising results.
Contribution
It presents a novel framework for Indonesian GEC corpus construction and evaluates the use of LLMs to improve annotation efficiency in low-resource settings.
Findings
Effective corpus construction framework for Indonesian GEC
Utilization of GPT-3.5-Turbo and GPT-4 enhances annotation efficiency
Potential for improving GEC performance in low-resource languages
Abstract
Currently, the majority of research in grammatical error correction (GEC) is concentrated on universal languages, such as English and Chinese. Many low-resource languages lack accessible evaluation corpora. How to efficiently construct high-quality evaluation corpora for GEC in low-resource languages has become a significant challenge. To fill these gaps, in this paper, we present a framework for constructing GEC corpora. Specifically, we focus on Indonesian as our research language and construct an evaluation corpus for Indonesian GEC using the proposed framework, addressing the limitations of existing evaluation corpora in Indonesian. Furthermore, we investigate the feasibility of utilizing existing large language models (LLMs), such as GPT-3.5-Turbo and GPT-4, to streamline corpus annotation efforts in GEC tasks. The results demonstrate significant potential for enhancing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Edcuational Technology Systems · Data Mining and Machine Learning Applications
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Label Smoothing · Absolute Position Encodings · Linear Layer · Position-Wise Feed-Forward Layer · Cosine Annealing · Transformer · Byte Pair Encoding · Layer Normalization
