Limits of Convergence-Rate Control for Open-Weight Safety

Domenic Rosati; Xijie Zeng; Hong Huang; Sebastian Dionicio; Subhabrata Majumdar; Frank Rudzicz; and Hassan Sajjad

arXiv:2602.18868·math.OC·February 24, 2026

Limits of Convergence-Rate Control for Open-Weight Safety

Domenic Rosati, Xijie Zeng, Hong Huang, Sebastian Dionicio, Subhabrata Majumdar, Frank Rudzicz, and Hassan Sajjad

PDF

Open Access

TL;DR

This paper explores the theoretical limits of controlling the convergence rate of open-weight foundation models to prevent harmful fine-tuning, introducing spectral reparameterization and a novel algorithm, SpecDef.

Contribution

It develops a spectral reparameterization approach and the SpecDef algorithm to slow convergence, and establishes fundamental limits of convergence control in adversarial settings.

Findings

01

SpecDef can slow optimization in non-adversarial settings

02

Fundamental limits exist for convergence control against knowledgeable attackers

03

Controlling convergence rate alone is insufficient for robust safety in adversarial scenarios

Abstract

Open-weight foundation models can be fine-tuned for harmful purposes after release, yet no existing training resistance methods provide theoretical guarantees. Treating these interventions as convergence-rate control problems allows us to connect optimization speed to the spectral structure of model weights. We leverage this insight to develop a novel understanding of convergence rate control through spectral reparameterization and derive an algorithm, SpecDef, that can both provably and empirically slow first- and second-order optimization in non-adversarial settings. In adversarial settings, we establish a fundamental limit on a broad class of convergence rate control methods including our own: an attacker with sufficient knowledge can restore fast convergence at a linear increase in model size. In order to overcome this limitation, future works will need to investigate methods that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Stochastic Gradient Optimization Techniques · Smart Grid Security and Resilience