Machines Getting with the Program: Understanding Intent Arguments of Non-Canonical Directives
Won Ik Cho, Young Ki Moon, Sangwhan Moon, Seok Min Kim, Nam Soo Kim

TL;DR
This paper introduces a new Korean corpus of 50K question/command-intent pairs for understanding non-canonical directives in dialogue systems, along with a method to address class imbalance and extend to multiple languages.
Contribution
It presents a novel corpus creation guideline, a large Korean dataset, and a method for mitigating class imbalance in intent classification for non-canonical speech.
Findings
Constructed a 50K instance Korean intent dataset
Proposed a class imbalance mitigation method
Demonstrated potential for multilingual extension
Abstract
Modern dialog managers face the challenge of having to fulfill human-level conversational skills as part of common user expectations, including but not limited to discourse with no clear objective. Along with these requirements, agents are expected to extrapolate intent from the user's dialogue even when subjected to non-canonical forms of speech. This depends on the agent's comprehension of paraphrased forms of such utterances. Especially in low-resource languages, the lack of data is a bottleneck that prevents advancements of the comprehension performance for these types of agents. In this regard, here we demonstrate the necessity of extracting the intent argument of non-canonical directives in a natural language format, which may yield more accurate parsing, and suggest guidelines for building a parallel corpus for this purpose. Following the guidelines, we construct a Korean corpus…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
