Enhancing Chinese Intent Classification by Dynamically Integrating Character Features into Word Embeddings with Ensemble Techniques
Ruixi Lin, Charles Costello, Charles Jankowski

TL;DR
This paper introduces a novel method for Chinese intent classification that dynamically combines character and word embeddings with contextual and ensemble techniques, improving accuracy without external pre-trained resources.
Contribution
It presents a generic, low-effort approach to integrate Chinese character features into word embeddings and employs ensemble methods, enhancing Chinese intent classification performance.
Findings
Outperforms baseline models and existing methods.
Effectively leverages Chinese character information.
Does not rely on external pre-trained embeddings.
Abstract
Intent classification has been widely researched on English data with deep learning approaches that are based on neural networks and word embeddings. The challenge for Chinese intent classification stems from the fact that, unlike English where most words are made up of 26 phonologic alphabet letters, Chinese is logographic, where a Chinese character is a more basic semantic unit that can be informative and its meaning does not vary too much in contexts. Chinese word embeddings alone can be inadequate for representing words, and pre-trained embeddings can suffer from not aligning well with the task at hand. To account for the inadequacy and leverage Chinese character information, we propose a low-effort and generic way to dynamically integrate character embedding based feature maps with word embedding based inputs, whose resulting word-character embeddings are stacked with a contextual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
