TL;DR
This paper introduces a meta-learning approach for few-shot spoken intent recognition that creates task-agnostic representations, enabling accurate classification with minimal training data on popular datasets.
Contribution
It proposes a novel representation-based meta-learning framework for few-shot spoken intent detection, demonstrating competitive performance with limited training samples.
Findings
Achieves 88.6% accuracy in 5-shot classification on Google Commands
Achieves 78.5% accuracy in 5-shot classification on Fluent Speech Commands
Performance comparable to traditional supervised models with more data
Abstract
Spoken intent detection has become a popular approach to interface with various smart devices with ease. However, such systems are limited to the preset list of intents-terms or commands, which restricts the quick customization of personal devices to new intents. This paper presents a few-shot spoken intent classification approach with task-agnostic representations via meta-learning paradigm. Specifically, we leverage the popular representation-based meta-learning learning to build a task-agnostic representation of utterances, that then use a linear classifier for prediction. We evaluate three such approaches on our novel experimental protocol developed on two popular spoken intent classification datasets: Google Commands and the Fluent Speech Commands dataset. For a 5-shot (1-shot) classification of novel classes, the proposed framework provides an average classification accuracy of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
