Recovering Plasticity of Neural Networks via Soft Weight Rescaling

Seungwon Oh; Sangyeon Park; Isaac Han; Kyung-Joong Kim

arXiv:2507.04683·cs.LG·July 8, 2025

Recovering Plasticity of Neural Networks via Soft Weight Rescaling

Seungwon Oh, Sangyeon Park, Isaac Han, Kyung-Joong Kim

PDF

3 Reviews

TL;DR

This paper introduces Soft Weight Rescaling (SWR), a method to prevent unbounded weight growth in neural networks, thereby recovering plasticity and improving learning performance without losing learned information.

Contribution

The paper proposes SWR, a novel technique that bounds weight magnitudes and maintains network plasticity, with theoretical proofs and empirical validation across various learning scenarios.

Findings

01

SWR effectively bounds weight magnitudes during training.

02

SWR improves performance in continual and warm-start learning.

03

SWR maintains learned information while enhancing plasticity.

Abstract

Recent studies have shown that as training progresses, neural networks gradually lose their capacity to learn new information, a phenomenon known as plasticity loss. An unbounded weight growth is one of the main causes of plasticity loss. Furthermore, it harms generalization capability and disrupts optimization dynamics. Re-initializing the network can be a solution, but it results in the loss of learned information, leading to performance drops. In this paper, we propose Soft Weight Rescaling (SWR), a novel approach that prevents unbounded weight growth without losing information. SWR recovers the plasticity of the network by simply scaling down the weight at each step of the learning process. We theoretically prove that SWR bounds weight magnitude and balances weight magnitude between layers. Our experiment shows that SWR improves performance on warm-start learning, continual…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 5Confidence 3

Strengths

1. The paper is easy to follow. 2. I think the authors are focusing on an interesting topic, i.e. loss of plasticity, that is worthy to probe. 3. The method proposed is simple and can be easily implemented in practice.

Weaknesses

1. An unbounded weight growth is one of the main causes of plasticity loss, and the authors propose reducing weight magnitude through weight scaling. Reducing the weight magnitude could be a common implementation in training, where L2 is widely used. So I think the key here lies in comparing the proposed method to L2. However, after reviewing the text, I did not find a clear rationale why we should choose the proposed method over L2. Could the authors provide specific cases that demonstrate the

Reviewer 02Rating 5Confidence 3

Strengths

- The paper is overall clearly written and the method is adequately described. - The proposed method SWR is computationally more efficient than previously proposed methods. - The experiment results and analysis provided in the paper are insightful.

Weaknesses

- The experimental results on smaller models are quite weak. For example, in warm-start and continual learning experiments, L2 (or S&P) seems to be better in most experiments (including the ones in the appendix). Even in Table 1, except for VGG, I wouldn't say the improvements are significantly higher since there's quite a bit of overlap with L2 in terms of standard deviations in MLP, and CNN cases. SWR only performs well on VGG which is not a very popular architecture even for vision-based expe

Reviewer 03Rating 3Confidence 4

Strengths

- This work progressively establishes and justifies its framework, making this paper easy to follow. - The results are promising, however, I have some concerns regarding the results as discussed below

Weaknesses

- One main drawback of the paper is the limited application of the paper. The authors made many assumptions (e.g., the network is affine, homogeneous with ReLU), which impedes the contributions and the applicability of the paper in real-world scenarios. - Some crucial statements are made without proper references. Furthermore, these statements are conflicted with the statements in various peer-reviewed and significant publications. - The paper came up with many theorems and definitions without e

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.