Healing Powers of BERT: How Task-Specific Fine-Tuning Recovers Corrupted Language Models
Shijie Han, Zhenyu Zhang, Andrei Arsene Simion

TL;DR
This paper investigates how fine-tuning can recover BERT's performance after parameter corruption, revealing that corruption at lower layers causes more damage and that recovery is limited, informing robustness strategies.
Contribution
It introduces a systematic study of BERT's robustness to parameter corruption and how fine-tuning recovers performance, highlighting the importance of layer-specific effects.
Findings
Corrupted models struggle to fully recover original performance.
Lower-layer corruption is more damaging than upper-layer corruption.
Fine-tuning offers limited recovery after significant corruption.
Abstract
Language models like BERT excel at sentence classification tasks due to extensive pre-training on general data, but their robustness to parameter corruption is unexplored. To understand this better, we look at what happens if a language model is "broken", in the sense that some of its parameters are corrupted and then recovered by fine-tuning. Strategically corrupting BERT variants at different levels, we find corrupted models struggle to fully recover their original performance, with higher corruption causing more severe degradation. Notably, bottom-layer corruption affecting fundamental linguistic features is more detrimental than top-layer corruption. Our insights contribute to understanding language model robustness and adaptability under adverse conditions, informing strategies for developing resilient NLP systems against parameter perturbations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · WordPiece · Residual Connection · Softmax · Layer Normalization · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Dropout · Adam
