Linguistically Informed Graph Model and Semantic Contrastive Learning for Korean Short Text Classification
JaeGeon Yoo, Byoungwook Kim, Yeongwook Yang, and Hong-Jun Jang

TL;DR
This paper introduces LIGRAM, a hierarchical graph model combined with semantic contrastive learning, specifically designed to improve Korean short text classification by capturing linguistic features and semantic similarities.
Contribution
The paper presents a novel hierarchical graph model and semantic contrastive learning approach tailored for Korean, addressing language-specific challenges in short text classification.
Findings
LIGRAM outperforms baseline models on Korean datasets.
Hierarchical graph construction effectively captures Korean linguistic features.
Semantic contrastive learning enhances class distinction in short texts.
Abstract
Short text classification (STC) remains a challenging task due to the scarcity of contextual information and labeled data. However, existing approaches have pre-dominantly focused on English because most benchmark datasets for the STC are primarily available in English. Consequently, existing methods seldom incorporate the linguistic and structural characteristics of Korean, such as its agglutinative morphology and flexible word order. To address these limitations, we propose LIGRAM, a hierarchical heterogeneous graph model for Korean short-text classification. The proposed model constructs sub-graphs at the morpheme, part-of-speech, and named-entity levels and hierarchically integrates them to compensate for the limited contextual information in short texts while precisely capturing the grammatical and semantic dependencies inherent in Korean. In addition, we apply Semantics-aware…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text and Document Classification Technologies · Text Readability and Simplification
