Monotonicity as an Architectural Bias for Robust Language Models

Patrick Cooper; Alireza Nadali; Ashutosh Trivedi; Alvaro Velasquez

arXiv:2602.02686·cs.CL·February 4, 2026

Monotonicity as an Architectural Bias for Robust Language Models

Patrick Cooper, Alireza Nadali, Ashutosh Trivedi, Alvaro Velasquez

PDF

Open Access

TL;DR

This paper introduces a selective monotonicity bias in Transformer-based language models, significantly enhancing robustness against adversarial attacks while maintaining performance.

Contribution

It demonstrates that enforcing monotonicity in specific model components improves robustness without sacrificing expressivity or accuracy.

Findings

01

Adversarial attack success rate drops from 69% to 19%.

02

Monotonic models maintain performance on standard tasks.

03

Selective monotonicity enhances robustness with minimal performance loss.

Abstract

Large language models (LLMs) are known to exhibit brittle behavior under adversarial prompts and jailbreak attacks, even after extensive alignment and fine-tuning. This fragility reflects a broader challenge of modern neural language models: small, carefully structured perturbations in high-dimensional input spaces can induce large and unpredictable changes in internal semantic representations and output. We investigate monotonicity as an architectural inductive bias for improving the robustness of Transformer-based language models. Monotonicity constrains semantic transformations so that strengthening information, evidence, or constraints cannot lead to regressions in the corresponding internal representations. Such order-preserving behavior has long been exploited in control and safety-critical systems to simplify reasoning and improve robustness, but has traditionally been viewed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Explainable Artificial Intelligence (XAI)