Modulating Regularization Frequency for Efficient Compression-Aware Model Training
Dongsoo Lee, Se Jung Kwon, Byeongwook Kim, Jeongin Yun, Baeseong Park,, Yongkweon Jeon

TL;DR
This paper proposes a novel regularization technique called regularization frequency, which optimizes the timing of compression during training to improve the efficiency and accuracy of compression-aware neural network training.
Contribution
It introduces regularization frequency as a new parameter to control compression regularization strength, enhancing training efficiency and model performance.
Findings
Regularization frequency significantly impacts model accuracy.
Combining regularization frequency with compression ratio improves training outcomes.
Occasional compression can match or outperform frequent compression.
Abstract
While model compression is increasingly important because of large neural network size, compression-aware training is challenging as it needs sophisticated model modifications and longer training time.In this paper, we introduce regularization frequency (i.e., how often compression is performed during training) as a new regularization technique for a practical and efficient compression-aware training method. For various regularization techniques, such as weight decay and dropout, optimizing the regularization strength is crucial to improve generalization in Deep Neural Networks (DNNs). While model compression also demands the right amount of regularization, the regularization strength incurred by model compression has been controlled only by compression ratio. Throughout various experiments, we show that regularization frequency critically affects the regularization strength of model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Machine Learning and Data Classification
MethodsWeight Decay
