WARP: Guaranteed Inner-Layer Repair of NLP Transformers

Hsin-Ling Hsu; Min-Yu Chen; Nai-Chia Chen; Yan-Ru Chen; Yi-Ling Chang; Fang Yu

arXiv:2604.00938·cs.LG·April 2, 2026

WARP: Guaranteed Inner-Layer Repair of NLP Transformers

Hsin-Ling Hsu, Min-Yu Chen, Nai-Chia Chen, Yan-Ru Chen, Yi-Ling Chang, Fang Yu

PDF

TL;DR

WARP is a novel, provability-guaranteed repair framework for Transformer NLP models that extends repair beyond the last layer, ensuring robustness and correctness through convex optimization.

Contribution

WARP introduces a constraint-based, convex quadratic programming approach for guaranteed, high-dimensional repair of Transformer models beyond the final layer.

Findings

01

WARP achieves practical robustness guarantees on encoder-only Transformers.

02

The method extends repair capabilities to multiple layers, not just the last.

03

Empirical results show improved adversarial robustness while maintaining correctness.

Abstract

Transformer-based NLP models remain vulnerable to adversarial perturbations, yet existing repair methods face a fundamental trade-off: gradient-based approaches offer flexibility but lack verifiability and often overfit; methods that do provide repair guarantees are restricted to the final layer or small networks, significantly limiting the parameter search space available for repair. We present WARP (Weight-Adjusted Repair with Provability), a constraint-based repair framework that extends repair beyond the last layer of Transformer models. WARP formulates repair as a convex quadratic program derived from a first-order linearization of the logit gap, enabling tractable optimization over a high-dimensional parameter space. Under the condition that the first-order approximation holds, this formulation induces three per-sample guarantees: (i) a positive margin constraint ensuring correct…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.