NoiER: An Approach for Training more Reliable Fine-TunedDownstream Task Models
Myeongjun Jang, Thomas Lukasiewicz

TL;DR
This paper introduces NoiER, a noise entropy regularization method that enhances the reliability of fine-tuned language models in distinguishing out-of-distribution sentences without extra data or models.
Contribution
We propose NoiER, a novel regularization technique that mitigates distribution collapse in pretrained language models during fine-tuning, improving OOD detection performance.
Findings
55% average improvement in OOD detection metrics
Effective without auxiliary models or additional data
Addresses distribution collapse in fine-tuned models
Abstract
The recent development in pretrained language models trained in a self-supervised fashion, such as BERT, is driving rapid progress in the field of NLP. However, their brilliant performance is based on leveraging syntactic artifacts of the training data rather than fully understanding the intrinsic meaning of language. The excessive exploitation of spurious artifacts causes a problematic issue: The distribution collapse problem, which is the phenomenon that the model fine-tuned on downstream tasks is unable to distinguish out-of-distribution (OOD) sentences while producing a high confidence score. In this paper, we argue that distribution collapse is a prevalent issue in pretrained language models and propose noise entropy regularisation (NoiER) as an efficient learning paradigm that solves the problem without auxiliary models and additional~data. The proposed approach improved…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsAttention Is All You Need · Linear Layer · WordPiece · Adam · Attention Dropout · Residual Connection · Weight Decay · Dropout · Dense Connections · Softmax
