How Does Diffusion Influence Pretrained Language Models on Out-of-Distribution Data?
Huazheng Wang, Daixuan Cheng, Haifeng Sun, Jingyu Wang, Qi Qi, Jianxin, Liao, Jing Wang, Cong Liu

TL;DR
This paper investigates how applying diffusion processes to pretrained language models affects their robustness to out-of-distribution data, revealing that diffusion can both impair reconstruction ability and enhance OOD detection.
Contribution
It provides the first comprehensive analysis of diffusion's impact on PLMs' OOD robustness, including evaluation of reconstruction and detection capabilities.
Findings
Diffusion training degrades OOD reconstruction ability.
Diffusion models improve OOD sample detection, achieving state-of-the-art accuracy.
Diffusion reduces overall OOD robustness of PLMs.
Abstract
Transformer-based pretrained language models (PLMs) have achieved great success in modern NLP. An important advantage of PLMs is good out-of-distribution (OOD) robustness. Recently, diffusion models have attracted a lot of work to apply diffusion to PLMs. It remains under-explored how diffusion influences PLMs on OOD data. The core of diffusion models is a forward diffusion process which gradually applies Gaussian noise to inputs, and a reverse denoising process which removes noise. The noised input reconstruction is a fundamental ability of diffusion models. We directly analyze OOD robustness by measuring the reconstruction loss, including testing the abilities to reconstruct OOD data, and to detect OOD samples. Experiments are conducted by analyzing different training parameters and data statistical features on eight datasets. It shows that finetuning PLMs with diffusion degrades the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
MethodsDiffusion
