On gradient descent training under data augmentation with on-line noisy   copies

Katsuyuki Hagiwara

arXiv:2206.03734·stat.ML·June 17, 2022

On gradient descent training under data augmentation with on-line noisy copies

Katsuyuki Hagiwara

PDF

Open Access

TL;DR

This paper analyzes how online data augmentation with noisy copies affects gradient descent training of linear regression, revealing regularization and acceleration effects, and extends insights to neural networks.

Contribution

It provides a theoretical analysis of online noisy data augmentation in gradient descent, showing its regularization and acceleration effects, and connects these findings to neural network training.

Findings

01

DA with online noisy copies acts as ridge regularization.

02

Training is accelerated proportionally to the number of noisy copies.

03

Results are confirmed through numerical experiments and extended to neural networks.

Abstract

In machine learning, data augmentation (DA) is a technique for improving the generalization performance. In this paper, we mainly considered gradient descent of linear regression under DA using noisy copies of datasets, in which noise is injected into inputs. We analyzed the situation where random noisy copies are newly generated and used at each epoch; i.e., the case of using on-line noisy copies. Therefore, it is viewed as an analysis on a method using noise injection into training process by DA manner; i.e., on-line version of DA. We derived the averaged behavior of training process under three situations which are the full-batch training under the sum of squared errors, the full-batch and mini-batch training under the mean squared error. We showed that, in all cases, training for DA with on-line copies is approximately equivalent to a ridge regularization whose regularization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and ELM · Neural Networks and Applications · Domain Adaptation and Few-Shot Learning

MethodsLinear Regression