Discriminatively-Tuned Generative Classifiers for Robust Natural Language Inference
Xiaoan Ding, Tianyu Liu, Baobao Chang, Zhifang Sui, Kevin Gimpel

TL;DR
This paper introduces GenNLI, a generative classifier for natural language inference that outperforms discriminative and pretrained models in data-efficient and robust settings, especially with limited or noisy data.
Contribution
We propose GenNLI, a novel generative classifier for NLI, and demonstrate its superiority over existing models through new training objectives and extensive empirical evaluation.
Findings
GenNLI outperforms baselines in small data settings
Infinilog loss improves generative classifier training
GenNLI shows robustness to label noise and imbalance
Abstract
While discriminative neural network classifiers are generally preferred, recent work has shown advantages of generative classifiers in term of data efficiency and robustness. In this paper, we focus on natural language inference (NLI). We propose GenNLI, a generative classifier for NLI tasks, and empirically characterize its performance by comparing it to five baselines, including discriminative models and large-scale pretrained language representation models like BERT. We explore training objectives for discriminative fine-tuning of our generative classifiers, showing improvements over log loss fine-tuning from prior work . In particular, we find strong results with a simple unbounded modification to log loss, which we call the "infinilog loss". Our experiments show that GenNLI outperforms both discriminative and pretrained baselines across several challenging NLI experimental…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsLinear Layer · Layer Normalization · Dense Connections · Discriminative Fine-Tuning · WordPiece · Multi-Head Attention · Dropout · Linear Warmup With Linear Decay · Attention Dropout · Weight Decay
