Building Korean linguistic resource for NLU data generation of banking app CS dialog system

Jeongwoo Yoon; On-yu Park; Changhoe Hwang; Gwanghoon Yoo; Eric Laporte; Jeesun Nam

arXiv:2605.10241·cs.CL·May 12, 2026

Building Korean linguistic resource for NLU data generation of banking app CS dialog system

Jeongwoo Yoon, On-yu Park, Changhoe Hwang, Gwanghoon Yoo, Eric Laporte, Jeesun Nam

PDF

TL;DR

This paper introduces FIAD, a Korean linguistic resource for NLU data generation in banking app customer service, improving intent and entity recognition through pattern-based data synthesis.

Contribution

It constructs a Korean annotated dataset using linguistic patterns and LGGs, enabling effective training of NLU models for banking dialogues.

Findings

01

DIET+ KorBERT achieved 0.95 intent accuracy.

02

Generated data improved model performance on diverse intents.

03

Linguistic patterns effectively cover Korean request utterances.

Abstract

Natural language understanding (NLU) is integral to task-oriented dialog systems, but demands a considerable amount of annotated training data to increase the coverage of diverse utterances. In this study, we report the construction of a linguistic resource named FIAD (Financial Annotated Dataset) and its use to generate a Korean annotated training data for NLU in the banking customer service (CS) domain. By an empirical examination of a corpus of banking app reviews, we identified three linguistic patterns occurring in Korean request utterances: TOPIC (ENTITY, FEATURE), EVENT, and DISCOURSE MARKER. We represented them in LGGs (Local Grammar Graphs) to generate annotated data covering diverse intents and entities. To assess the practicality of the resource, we evaluate the performances of DIET-only (Intent: 0.91 /Topic [entity+feature]: 0.83), DIET+ HANBERT (I:0.94/T:0.85), DIET+ KoBERT…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.