LipNeXt: Scaling up Lipschitz-based Certified Robustness to Billion-parameter Models

Kai Hu; Haoqi Hu; Matt Fredrikson

arXiv:2601.18513·cs.LG·January 27, 2026

LipNeXt: Scaling up Lipschitz-based Certified Robustness to Billion-parameter Models

Kai Hu, Haoqi Hu, Matt Fredrikson

PDF

Open Access 3 Reviews

TL;DR

LipNeXt introduces a scalable, constraint-free, convolution-free 1-Lipschitz architecture that achieves state-of-the-art certified robustness on large models and datasets, demonstrating the potential of Lipschitz-based certification for modern deep learning.

Contribution

The paper presents LipNeXt, a novel 1-Lipschitz architecture that scales to billion-parameter models using manifold optimization and spatial shift modules, without convolutions or constraints.

Findings

01

Achieves state-of-the-art certified robustness on CIFAR-10/100 and Tiny-ImageNet.

02

Scales to 1-2 billion parameters on ImageNet, improving robustness over prior Lipschitz models.

03

Maintains efficient, stable low-precision training while providing deterministic robustness guarantees.

Abstract

Lipschitz-based certification offers efficient, deterministic robustness guarantees but has struggled to scale in model size, training efficiency, and ImageNet performance. We introduce \emph{LipNeXt}, the first \emph{constraint-free} and \emph{convolution-free} 1-Lipschitz architecture for certified robustness. LipNeXt is built using two techniques: (1) a manifold optimization procedure that updates parameters directly on the orthogonal manifold and (2) a \emph{Spatial Shift Module} to model spatial pattern without convolutions. The full network uses orthogonal projections, spatial shifts, a simple 1-Lipschitz $β$ -Abs nonlinearity, and $L_{2}$ spatial pooling to maintain tight Lipschitz control while enabling expressive feature mixing. Across CIFAR-10/100 and Tiny-ImageNet, LipNeXt achieves state-of-the-art clean and certified robust accuracy (CRA), and on ImageNet it scales to 1-2B…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 4

Strengths

1. The paper is well-organized and clearly written. 2. The proposed manifold optimization and spatial shift techniques are interesting and technically sound.

Weaknesses

1. Some important baselines are not discussed in the related work section, such as Sandwich and BRONet. 2. The method integrates several known techniques, making it somewhat difficult to assess the individual effectiveness of each component.

Reviewer 02Rating 6Confidence 2

Strengths

The paper provides solid empirical evidence, along with an ablation study, to support the main claims.

Weaknesses

Table 2 presents results with additional data; however, I noticed that the total number of parameters for the proposed model is 256M, which is significantly larger than the competitors. Could the authors provide results for a smaller model configuration, such as L32W1024?

Reviewer 03Rating 8Confidence 4

Strengths

The proposed method exhibits strong rigor and novelty. Almost all designs are supported by strong theoretical analysis and well-motivated. The final overall performance demonstrates the effectiveness of the general algorithm. Detailed ablation studies are provided in the appendix to demonstrate the effectiveness of individual modules.

Weaknesses

The overall algorithm seems costly. On the memory side, the optimizer requires a copy of the full parameter, thus doubles the memory cost, which is especially concerning for a model with billion-parameters. On the computation side, the main results on conducted on 8xH100 GPUs, which seems hard to reproduce by academic labs and not scalable to harder tasks. I will not attack the main contribution due to the costs though. The parameter efficiency is of question. All comparisons, although meaningf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning