Compression Aware Certified Training
Changming Xu, Gagandeep Singh

TL;DR
CACTUS is a unified training framework that enhances neural network robustness and efficiency by integrating compression techniques like pruning and quantization during training, ensuring high certified accuracy even after compression.
Contribution
It introduces CACTUS, a novel method that combines compression and certified robustness training, outperforming existing approaches in accuracy and robustness across multiple datasets.
Findings
CACTUS maintains high certified accuracy after compression.
It achieves state-of-the-art results for pruning and quantization.
Effective across various datasets and input specifications.
Abstract
Deep neural networks deployed in safety-critical, resource-constrained environments must balance efficiency and robustness. Existing methods treat compression and certified robustness as separate goals, compromising either efficiency or safety. We propose CACTUS (Compression Aware Certified Training Using network Sets), a general framework for unifying these objectives during training. CACTUS models maintain high certified accuracy even when compressed. We apply CACTUS for both pruning and quantization and show that it effectively trains models which can be efficiently compressed while maintaining high accuracy and certifiable robustness. CACTUS achieves state-of-the-art accuracy and certified performance for both pruning and quantization on a variety of datasets and input specifications.
Peer Reviews
Decision·Submitted to ICLR 2026
Deploying robust models on resource-limited devices is an important and timely research direction. The joint training objective is clearly defined and implemented. The use of compression sets and curriculum-based loss weighting is technically reasonable. Experiments are carefully executed and include ablations (AWP radius, compression-set size). CACTUS consistently outperforms sequential baselines in certified accuracy under compression. The paper is well written, equations are clean, and imp
The work overlooks Gui et al. (2019), "Model Compression with Adversarial Robustness: A Unified Optimization Framework", which already introduced a unified optimization framework combining model compression (pruning and quantization) with adversarial training. While ATMC focused on empirical rather than certified robustness, the underlying idea (joint optimization of robustness and compression) is the same. A clearer connection to this prior line of work would strengthen the paper’s positioning
- Clear problem formulation unifying certified training with compression; objective over a compression set is well motivated. - Theory for quantization: a clean reduction from quantization to weight-bounded perturbations via AWP with a formal upper-bound guarantee. - Consistent empirical gains under compression: across pruning and quantization, CACTUS improves certified accuracy versus robust baselines; integration with multiple certified-training losses shows method generality. - Ablations beyo
- Scope of headline comparisons: By design, CACTUS is strongest when compressed; for $\delta$=0 or unquantized, SABR typically wins. This is expected but should be emphasized alongside deployment guidance. - Condition discrepancy: Main text, Theorem 4.1 states $q_{step}\leq \eta$ while Appendix Theorem D.1 states $q_{step}\leq 2\eta$. The bound is fine, yet the precise requirement should be consistently stated. - Compute cost: Training time overhead is non-trivial; although addressed in Append
1. This paper is the first work to address certified robustness for model compression. 2. The background section is well-written, providing a clear foundation to understand the problem.
1. **Writing and Notation Quality**. The paper suffers from many writing and notation issues that significantly affect readability and clarity. - Several key symbols are used before being defined, such as $\theta$, $Q(\cdot)$, $\Delta$, and $\eta$. - The notation convention stated in Section 2 is inconsistent with later usage. For example, the authors claim lowercase bold letters denote vectors, but $\theta$, which may represent network parameters, should arguably be bold ($\boldsymbol{\theta}$)
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotic Mechanisms and Dynamics
