Does Incomplete Syntax Influence Korean Language Model? Focusing on Word Order and Case Markers
Jong Myoung Kim, Young-Jun Lee, Yong-jin Han, Sangkeun Jung, Ho-Jin, Choi

TL;DR
This paper investigates how Korean language models handle incomplete syntax, such as missing word order and case markers, using a new dataset, and finds that fine-tuning improves their flexibility and understanding of Korean sentence structures.
Contribution
Introduces the SIKOK dataset to evaluate and enhance Korean language models' ability to process incomplete syntax, demonstrating improved performance through fine-tuning.
Findings
Models reflect Korean's syntactic flexibility
Fine-tuning with SIKO improves handling of incomplete inputs
SIKO dataset serves as effective data augmentation
Abstract
Syntactic elements, such as word order and case markers, are fundamental in natural language processing. Recent studies show that syntactic information boosts language model performance and offers clues for people to understand their learning mechanisms. Unlike languages with a fixed word order such as English, Korean allows for varied word sequences, despite its canonical structure, due to case markers that indicate the functions of sentence components. This study explores whether Korean language models can accurately capture this flexibility. We note that incomplete word orders and omitted case markers frequently appear in ordinary Korean communication. To investigate this further, we introduce the Syntactically Incomplete Korean (SIKO) dataset. Through SIKO, we assessed Korean language models' flexibility with incomplete syntax and confirmed the dataset's training value. Results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmployee Welfare and Language Studies · Natural Language Processing Techniques
