A Guide to Robust Generalization: The Impact of Architecture, Pre-training, and Optimization Strategy

Maxime Heuillet; Rishika Bhagwatkar; Jonas Ngnaw\'e; Yann Pequignot; Alexandre Larouche; Christian Gagn\'e; Irina Rish; Ola Ahmad; Audrey Durand

arXiv:2508.14079·cs.LG·August 21, 2025

A Guide to Robust Generalization: The Impact of Architecture, Pre-training, and Optimization Strategy

Maxime Heuillet, Rishika Bhagwatkar, Jonas Ngnaw\'e, Yann Pequignot, Alexandre Larouche, Christian Gagn\'e, Irina Rish, Ola Ahmad, Audrey Durand

PDF

Open Access

TL;DR

This paper provides a comprehensive empirical analysis of how architecture, pretraining, and optimization choices affect the robustness of deep learning models to input perturbations, offering practical insights for improving generalization.

Contribution

It presents the most diverse benchmark to date on robust fine-tuning, analyzing 1,440 configurations across multiple datasets, architectures, and perturbations, revealing key factors influencing robustness.

Findings

01

Supervised pretrained CNNs often outperform attention-based models in robustness.

02

Design choices like architecture and loss functions significantly impact generalization to unseen perturbations.

03

The study offers practical guidance for selecting model and training strategies to enhance robustness.

Abstract

Deep learning models operating in the image domain are vulnerable to small input perturbations. For years, robustness to such perturbations was pursued by training models from scratch (i.e., with random initializations) using specialized loss objectives. Recently, robust fine-tuning has emerged as a more efficient alternative: instead of training from scratch, pretrained models are adapted to maximize predictive performance and robustness. To conduct robust fine-tuning, practitioners design an optimization strategy that includes the model update protocol (e.g., full or partial) and the specialized loss objective. Additional design choices include the architecture type and size, and the pretrained representation. These design choices affect robust generalization, which is the model's ability to maintain performance when exposed to new and unseen perturbations at test time. Understanding…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsManufacturing Process and Optimization