TL;DR
This paper introduces a theoretical and practical method called Lipschitz-aware linearity grafting that reduces approximation errors in neural networks, leading to tighter local Lipschitz constants and improved certified robustness against adversarial attacks.
Contribution
It provides the first theoretical analysis of how linearity grafting enhances certified robustness and proposes a new method to tighten local Lipschitz constants without certified training.
Findings
Linearity grafting improves certified robustness by tightening local Lipschitz constants.
Theoretical analysis explains how grafting reduces approximation errors.
Experiments show enhanced robustness through linearity grafting.
Abstract
Lipschitz constant is a fundamental property in certified robustness, as smaller values imply robustness to adversarial examples when a model is confident in its prediction. However, identifying the worst-case adversarial examples is known to be an NP-complete problem. Although over-approximation methods have shown success in neural network verification to address this challenge, reducing approximation errors remains a significant obstacle. Furthermore, these approximation errors hinder the ability to obtain tight local Lipschitz constants, which are crucial for certified robustness. Originally, grafting linearity into non-linear activation functions was proposed to reduce the number of unstable neurons, enabling scalable and complete verification. However, no prior theoretical analysis has explained how linearity grafting improves certified robustness. We instead consider linearity…
Peer Reviews
Decision·Submitted to ICLR 2026
++ It is novel and interesting to consider verified robustness from the perspective of controlling Lipschitz constant and grafting. ++ The proposed method is generic and plug-and-play. ++ The intuition is theoretically justified to some degree.
1. The proposed framework introduces a lot of hyper-parameters, including the number of grafted neurons each layer, $k$ in Equation (5), $\lambda$, $\beta$ and $\gamma$ in Equation (6). I believe all these hyper-parameters will affect the performance to some degree. However, I did not see ablation studies or adequate discussions about them. 2. The experiments are weak in general. Comparisons with more robustness certification and the corresponding provable training algorithms should be included
1. The paper is well organized and clearly presented. 2. The paper conducted extensive ablation studies to evaluate the effectiveness of the proposed method.
1. The improvement is not significant and inconsistent. (as shown in Table 1 and 2) 2. The core idea and methodology lack sufficient insight and novelty. Given that Lipschitz neural networks, such as LiResNet++ (Hu, 2024), have already scaled certified robustness to ImageNet and billion-parameter models, the proposed approach appears less competitive. Therefore, I would expect either a substantial performance improvement or a more conceptually innovative contribution. [1] Hu, Kai, et al. "A Reci
The proposed approach improves upon the existing Linearity Grafting method and demonstrates potential in further tightening local Lipschitz constants.
* The authors state at the beginning of the abstract that “Lipschitz constant is a fundamental property in certified robustness, as smaller values imply robustness to adversarial examples.” However, despite obtaining a tighter local Lipschitz constant, the empirical results (Table 1) show a drop in robust accuracy (RA %). This seems contradictory to the intended goal of improving robustness. Could the authors clarify whether this behavior aligns with their theoretical claims? * The standard acc
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
