Effective Regularization Through Loss-Function Metalearning
Santiago Gonzalez, Xin Qiu, and Risto Miikkulainen

TL;DR
This paper theoretically analyzes how evolved loss functions, like those discovered by TaylorGLO, regularize neural networks by balancing error minimization and overfitting avoidance, leading to more robust models.
Contribution
It provides a theoretical framework explaining how evolved loss functions regularize neural networks and introduces a constraint for designing more effective loss functions.
Findings
Evolved loss functions balance error reduction and overfitting prevention.
Theoretical analysis applies to other regularization methods like label smoothing.
Networks trained with these loss functions are more robust to adversarial inputs.
Abstract
Evolutionary computation can be used to optimize several different aspects of neural network architectures. For instance, the TaylorGLO method discovers novel, customized loss functions, resulting in improved performance, faster training, and improved data utilization. A likely reason is that such functions discourage overfitting, leading to effective regularization. This paper demonstrates theoretically that this is indeed the case for TaylorGLO. Learning rule decomposition reveals that evolved loss functions balance two factors: the pull toward zero error, and a push away from it to avoid overfitting. This is a general principle that may be used to understand other regularization techniques as well (as demonstrated in this paper for label smoothing). The theoretical analysis leads to a constraint that can be utilized to find more effective loss functions in practice; the mechanism…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning · Anomaly Detection Techniques and Applications
