Neural Network Optimization Reimagined: Decoupled Techniques for Scratch and Fine-Tuning

Xin Ning; Qiankun Li; Xiaolong Huang; Qiupu Chen; Feng He; Weijun Li; Prayag Tiwari; Xinwang Liu

arXiv:2604.22838·cs.CV·April 28, 2026

Neural Network Optimization Reimagined: Decoupled Techniques for Scratch and Fine-Tuning

Xin Ning, Qiankun Li, Xiaolong Huang, Qiupu Chen, Feng He, Weijun Li, Prayag Tiwari, Xinwang Liu

PDF

1 Repo

TL;DR

This paper introduces DualOpt, a novel optimizer that decouples techniques for training neural networks from scratch and fine-tuning pre-trained models, enhancing convergence, generalization, and knowledge retention.

Contribution

The paper proposes DualOpt, which incorporates layer-wise weight decay for scratch training and weight rollback for fine-tuning, addressing the distinct needs of these paradigms.

Findings

01

DualOpt achieves state-of-the-art results across multiple tasks.

02

Layer-wise weight decay improves convergence and generalization.

03

Weight rollback mitigates knowledge forgetting during fine-tuning.

Abstract

With the accumulation of resources in the era of big data and the rise of pre-trained models in deep learning, optimizing neural networks for various tasks often involves different strategies for fine-tuning pre-trained models versus training from scratch. However, existing optimizers primarily focus on reducing the loss function by updating model parameters, without fully addressing the unique demands of these two major paradigms. In this paper, we propose DualOpt, a novel approach that decouples optimization techniques specifically tailored for these distinct training scenarios. For training from scratch, we introduce real-time layer-wise weight decay, designed to enhance both convergence and generalization by aligning with the characteristics of weight updates and network architecture. For more importantly fine-tuning, we integrate weight rollback with the optimizer, incorporating a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

qklee-lz/OLOR-AAAI-2024
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.