MTAdam: Automatic Balancing of Multiple Training Loss Terms
Itzik Malkiel, Lior Wolf

TL;DR
MTAdam is a novel optimization algorithm that automatically balances multiple loss terms during neural network training by dynamically adjusting gradient magnitudes per layer, reducing manual tuning effort.
Contribution
The paper introduces MTAdam, a generalized Adam optimizer that balances multiple loss terms automatically, adapting to training dynamics and layer-specific needs.
Findings
MTAdam achieves comparable or better training results than traditional methods.
It reduces the need for manual hyperparameter tuning of loss weights.
The method adapts dynamically to changing loss trade-offs during training.
Abstract
When training neural models, it is common to combine multiple loss terms. The balancing of these terms requires considerable human effort and is computationally demanding. Moreover, the optimal trade-off between the loss term can change as training progresses, especially for adversarial terms. In this work, we generalize the Adam optimization algorithm to handle multiple loss terms. The guiding principle is that for every layer, the gradient magnitude of the terms should be balanced. To this end, the Multi-Term Adam (MTAdam) computes the derivative of each loss term separately, infers the first and second moments per parameter and loss term, and calculates a first moment for the magnitude per layer of the gradients arising from each loss. This magnitude is used to continuously balance the gradients across all layers, in a manner that both varies from one layer to the next and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Advanced Neural Network Applications · Adversarial Robustness in Machine Learning
MethodsAdam
