Robustness of Agentic AI Systems via Adversarially-Aligned Jacobian Regularization
Furkan Mumcu, Yasin Yilmaz

TL;DR
This paper introduces AAJR, a new Jacobian regularization method that improves the robustness of agentic AI systems by focusing on adversarial directions, reducing conservativeness and enhancing stability.
Contribution
The paper proposes AAJR, a trajectory-aligned Jacobian regularization technique that offers a less conservative, more effective way to enhance robustness in multi-agent AI systems.
Findings
AAJR yields a larger admissible policy class than global bounds.
AAJR controls smoothness along optimization trajectories, ensuring stability.
Theoretical analysis shows AAJR reduces robustness trade-offs.
Abstract
As Large Language Models (LLMs) transition into autonomous multi-agent ecosystems, robust minimax training becomes essential yet remains prone to instability when highly non-linear policies induce extreme local curvature in the inner maximization. Standard remedies that enforce global Jacobian bounds are overly conservative, suppressing sensitivity in all directions and inducing a large Price of Robustness. We introduce Adversarially-Aligned Jacobian Regularization (AAJR), a trajectory-aligned approach that controls sensitivity strictly along adversarial ascent directions. We prove that AAJR yields a strictly larger admissible policy class than global constraints under mild conditions, implying a weakly smaller approximation gap and reduced nominal performance degradation. Furthermore, we derive step-size conditions under which AAJR controls effective smoothness along optimization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics · Stochastic Gradient Optimization Techniques
