On gradient descent training under data augmentation with on-line noisy copies
Katsuyuki Hagiwara

TL;DR
This paper analyzes how online data augmentation with noisy copies affects gradient descent training of linear regression, revealing regularization and acceleration effects, and extends insights to neural networks.
Contribution
It provides a theoretical analysis of online noisy data augmentation in gradient descent, showing its regularization and acceleration effects, and connects these findings to neural network training.
Findings
DA with online noisy copies acts as ridge regularization.
Training is accelerated proportionally to the number of noisy copies.
Results are confirmed through numerical experiments and extended to neural networks.
Abstract
In machine learning, data augmentation (DA) is a technique for improving the generalization performance. In this paper, we mainly considered gradient descent of linear regression under DA using noisy copies of datasets, in which noise is injected into inputs. We analyzed the situation where random noisy copies are newly generated and used at each epoch; i.e., the case of using on-line noisy copies. Therefore, it is viewed as an analysis on a method using noise injection into training process by DA manner; i.e., on-line version of DA. We derived the averaged behavior of training process under three situations which are the full-batch training under the sum of squared errors, the full-batch and mini-batch training under the mean squared error. We showed that, in all cases, training for DA with on-line copies is approximately equivalent to a ridge regularization whose regularization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and ELM · Neural Networks and Applications · Domain Adaptation and Few-Shot Learning
MethodsLinear Regression
