IM-BERT: Enhancing Robustness of BERT through the Implicit Euler Method
Mihyeon Kim, Juhyoung Park, Youngbin Kim

TL;DR
IM-BERT enhances BERT's robustness against adversarial attacks by modeling its layers as solutions to ODEs and applying the implicit Euler method, leading to significant performance improvements without extra parameters.
Contribution
This paper introduces IM-BERT, a novel approach that conceptualizes BERT layers as ODE solutions and applies the implicit Euler method to improve robustness against adversarial attacks.
Findings
IM-BERT outperforms BERT by 8.3% on AdvGLUE.
IM-BERT achieves 5.9% higher accuracy in low-resource scenarios.
The implicit Euler approach enhances numerical stability and robustness.
Abstract
Pre-trained Language Models (PLMs) have achieved remarkable performance on diverse NLP tasks through pre-training and fine-tuning. However, fine-tuning the model with a large number of parameters on limited downstream datasets often leads to vulnerability to adversarial attacks, causing overfitting of the model on standard datasets. To address these issues, we propose IM-BERT from the perspective of a dynamic system by conceptualizing a layer of BERT as a solution of Ordinary Differential Equations (ODEs). Under the situation of initial value perturbation, we analyze the numerical stability of two main numerical ODE solvers: the explicit and implicit Euler approaches. Based on these analyses, we introduce a numerically robust IM-connection incorporating BERT's layers. This strategy enhances the robustness of PLMs against adversarial attacks, even in low-resource scenarios, without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Hate Speech and Cyberbullying Detection
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Warmup With Linear Decay · Dropout · Layer Normalization · Attention Dropout · Softmax · Residual Connection · WordPiece · Linear Layer
