Language Model Pre-training on True Negatives
Zhuosheng Zhang, Hai Zhao, Masao Utiyama, Eiichiro Sumita

TL;DR
This paper identifies the false negative issue in discriminative pre-trained language models and proposes enhanced pre-training methods that focus on true negatives, leading to improved performance and robustness on benchmarks.
Contribution
It introduces a novel approach to mitigate false negatives in PLMs by correcting harmful gradient updates, enhancing model quality.
Findings
Improved performance on GLUE and SQuAD benchmarks.
Enhanced robustness of PLMs against false negative issues.
Effective correction of gradient updates during pre-training.
Abstract
Discriminative pre-trained language models (PLMs) learn to predict original texts from intentionally corrupted ones. Taking the former text as positive and the latter as negative samples, the PLM can be trained effectively for contextualized representation. However, the training of such a type of PLMs highly relies on the quality of the automatically constructed samples. Existing PLMs simply treat all corrupted texts as equal negative without any examination, which actually lets the resulting model inevitably suffer from the false negative issue where training is carried out on pseudo-negative data and leads to less efficiency and less robustness in the resulting PLMs. In this work, on the basis of defining the false negative issue in discriminative PLMs that has been ignored for a long time, we design enhanced pre-training methods to counteract false negative predictions and encourage…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
