Effectiveness of Pre-training for Few-shot Intent Classification

Haode Zhang; Yuwei Zhang; Li-Ming Zhan; Jiaxin Chen; Guangyuan Shi,; Albert Y.S. Lam; Xiao-Ming Wu

arXiv:2109.05782·cs.CL·September 17, 2024

Effectiveness of Pre-training for Few-shot Intent Classification

Haode Zhang, Yuwei Zhang, Li-Ming Zhan, Jiaxin Chen, Guangyuan Shi,, Albert Y.S. Lam, Xiao-Ming Wu

PDF

Open Access

TL;DR

This paper demonstrates that fine-tuning BERT with a small labeled dataset is highly effective for few-shot intent classification, outperforming more complex pre-training methods on diverse domains.

Contribution

It introduces IntentBERT, a simple fine-tuning approach that achieves superior few-shot intent classification performance across various domains.

Findings

01

Fine-tuning BERT with ~1,000 labeled examples outperforms existing pre-trained models.

02

IntentBERT generalizes well across different domains with minimal labeled data.

03

Simple fine-tuning is highly effective for few-shot intent classification.

Abstract

This paper investigates the effectiveness of pre-training for few-shot intent classification. While existing paradigms commonly further pre-train language models such as BERT on a vast amount of unlabeled corpus, we find it highly effective and efficient to simply fine-tune BERT with a small set of labeled utterances from public datasets. Specifically, fine-tuning BERT with roughly 1,000 labeled data yields a pre-trained model -- IntentBERT, which can easily surpass the performance of existing pre-trained models for few-shot intent classification on novel domains with very different semantics. The high effectiveness of IntentBERT confirms the feasibility and practicality of few-shot intent detection, and its high generalization ability across different domains suggests that intent classification tasks may share a similar underlying structure, which can be efficiently learned from a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Interpreting and Communication in Healthcare

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Linear Warmup With Linear Decay · Weight Decay · Attention Dropout · Dropout · Dense Connections