TL;DR
This paper proposes using Semantic Hashing as an embedding method for intent classification, especially effective on small datasets with vocabulary issues and spelling errors, achieving state-of-the-art results on three benchmarks.
Contribution
The paper introduces Semantic Hashing for intent classification, addressing vocabulary dependency and spelling errors, and demonstrates superior performance on multiple small datasets.
Findings
Achieved state-of-the-art performance on AskUbuntu, Chatbot, and Web Application datasets.
Semantic Hashing effectively handles out-of-vocabulary terms and spelling errors.
Outperforms traditional word embedding methods on small intent classification datasets.
Abstract
In this paper, we introduce the use of Semantic Hashing as embedding for the task of Intent Classification and achieve state-of-the-art performance on three frequently used benchmarks. Intent Classification on a small dataset is a challenging task for data-hungry state-of-the-art Deep Learning based systems. Semantic Hashing is an attempt to overcome such a challenge and learn robust text classification. Current word embedding based are dependent on vocabularies. One of the major drawbacks of such methods is out-of-vocabulary terms, especially when having small training datasets and using a wider vocabulary. This is the case in Intent Classification for chatbots, where typically small datasets are extracted from internet communication. Two problems arise by the use of internet communication. First, such datasets miss a lot of terms in the vocabulary to use word embeddings efficiently.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
