Revisiting Random Weight Perturbation for Efficiently Improving   Generalization

Tao Li; Qinghua Tao; Weihao Yan; Zehao Lei; Yingwen Wu; Kun Fang,; Mingzhen He; Xiaolin Huang

arXiv:2404.00357·cs.LG·April 2, 2024·2 cites

Revisiting Random Weight Perturbation for Efficiently Improving Generalization

Tao Li, Qinghua Tao, Weihao Yan, Zehao Lei, Yingwen Wu, Kun Fang,, Mingzhen He, Xiaolin Huang

PDF

Open Access 1 Repo

TL;DR

This paper revisits random weight perturbation (RWP) for deep neural networks, proposing improvements that enhance its efficiency and performance, making it competitive with sharpness-aware minimization (SAM) in generalization tasks.

Contribution

The paper introduces improved RWP methods that better balance generalization and convergence, and optimize perturbation generation, achieving superior efficiency and comparable or better results than SAM.

Findings

01

Enhanced RWP methods outperform traditional RWP in large-scale problems.

02

Proposed techniques achieve similar or better generalization than SAM.

03

Improvements lead to more efficient training with better generalization.

Abstract

Improving the generalization ability of modern deep neural networks (DNNs) is a fundamental challenge in machine learning. Two branches of methods have been proposed to seek flat minima and improve generalization: one led by sharpness-aware minimization (SAM) minimizes the worst-case neighborhood loss through adversarial weight perturbation (AWP), and the other minimizes the expected Bayes objective with random weight perturbation (RWP). While RWP offers advantages in computation and is closely linked to AWP on a mathematical basis, its empirical performance has consistently lagged behind that of AWP. In this paper, we revisit the use of RWP for improving generalization and propose improvements from two perspectives: i) the trade-off between generalization and convergence and ii) the random perturbation generation. Through extensive experimental evaluations, we demonstrate that our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nblt/marwp
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsControl Systems and Identification · Neural Networks and Applications · Digital Filter Design and Implementation

MethodsSharpness-Aware Minimization · Segment Anything Model