Adaptive Mixing of Auxiliary Losses in Supervised Learning
Durga Sivasubramanian, Ayush Maheshwari, Pradeep Shenoy, Prathosh AP, and Ganesh Ramakrishnan

TL;DR
This paper introduces AMAL, a meta-learning approach that adaptively combines auxiliary losses at the instance level in supervised learning, improving performance in knowledge distillation and rule-denoising tasks.
Contribution
It proposes a novel bi-level optimization framework for learning optimal loss mixing weights dynamically, with a practical meta-learning solution applicable across various supervised learning scenarios.
Findings
AMAL outperforms baseline methods in knowledge distillation tasks.
AMAL improves rule-denoising accuracy over existing approaches.
Empirical analysis reveals how adaptive loss mixing enhances learning performance.
Abstract
In several supervised learning scenarios, auxiliary losses are used in order to introduce additional information or constraints into the supervised learning objective. For instance, knowledge distillation aims to mimic outputs of a powerful teacher model; similarly, in rule-based approaches, weak labeling information is provided by labeling functions which may be noisy rule-based approximations to true labels. We tackle the problem of learning to combine these losses in a principled manner. Our proposal, AMAL, uses a bi-level optimization criterion on validation data to learn optimal mixing weights, at an instance level, over the training data. We describe a meta-learning approach towards solving this bi-level objective and show how it can be applied to different scenarios in supervised learning. Experiments in a number of knowledge distillation and rule-denoising domains show that AMAL…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning and Data Classification · Domain Adaptation and Few-Shot Learning · Imbalanced Data Classification Techniques
MethodsKnowledge Distillation · Adaptive Robust Loss
