KoGEC : Korean Grammatical Error Correction with Pre-trained Translation Models
Taeeun Kim, Semin Jeong, Youngsook Song

TL;DR
KoGEC is a Korean grammatical error correction system that fine-tunes pre-trained translation models, outperforming large language models in accuracy and error diversity, and includes a user-friendly Chrome extension.
Contribution
This paper introduces KoGEC, a novel Korean GEC system based on fine-tuned NLLB models, demonstrating superior performance and balanced error correction compared to large language models.
Findings
Fine-tuned NLLB models outperform GPT-4 and HCX-3 in Korean GEC.
KoGEC achieves a balanced correction across various error types.
Vocabulary expansion decreases model performance.
Abstract
This research introduces KoGEC, a Korean Grammatical Error Correction system using pre\--trained translation models. We fine-tuned NLLB (No Language Left Behind) models for Korean GEC, comparing their performance against large language models like GPT-4 and HCX-3. The study used two social media conversation datasets for training and testing. The NLLB models were fine-tuned using special language tokens to distinguish between original and corrected Korean sentences. Evaluation was done using BLEU scores and an "LLM as judge" method to classify error types. Results showed that the fine-tuned NLLB (KoGEC) models outperformed GPT-4o and HCX-3 in Korean GEC tasks. KoGEC demonstrated a more balanced error correction profile across various error types, whereas the larger LLMs tended to focus less on punctuation errors. We also developed a Chrome extension to make the KoGEC system accessible…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
