Noisy Channel Language Model Prompting for Few-Shot Text Classification
Sewon Min, Mike Lewis, Hannaneh Hajishirzi, Luke Zettlemoyer

TL;DR
This paper proposes a noisy channel approach for few-shot text classification that improves stability and performance over direct models by computing the probability of input given label, especially effective with limited data.
Contribution
It introduces a novel noisy channel prompting method for few-shot learning that enhances stability and accuracy over direct models, with practical guidelines for its use.
Findings
Channel models outperform direct models in few-shot settings.
Channel prompt tuning is better with small training data and imbalanced labels.
The approach shows lower variance and higher worst-case accuracy.
Abstract
We introduce a noisy channel approach for language model prompting in few-shot text classification. Instead of computing the likelihood of the label given the input (referred as direct models), channel models compute the conditional probability of the input given the label, and are thereby required to explain every word in the input. We use channel models for recently proposed few-shot learning methods with no or very limited updates to the language model parameters, via either in-context demonstration or prompt tuning. Our experiments show that, for both methods, channel models significantly outperform their direct counterparts, which we attribute to their stability, i.e., lower variance and higher worst-case accuracy. We also present extensive ablations that provide recommendations for when to use channel prompt tuning instead of other competitive methods (e.g., direct head tuning):…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Domain Adaptation and Few-Shot Learning
