Balancing Average and Worst-case Accuracy in Multitask Learning

Paul Michel; Sebastian Ruder; Dani Yogatama

arXiv:2110.05838·cs.LG·October 13, 2021

Balancing Average and Worst-case Accuracy in Multitask Learning

Paul Michel, Sebastian Ruder, Dani Yogatama

PDF

Open Access

TL;DR

This paper introduces Lookahead-DRO, a novel method to improve worst-case accuracy in multitask learning by dynamically re-weighting task losses, balancing performance across tasks more effectively.

Contribution

The paper proposes Lookahead-DRO, an enhanced distributionally robust optimization technique that anticipates task interactions to better balance average and worst-case accuracy in multitask learning.

Findings

01

L-DRO improves worst-case accuracy in synthetic and real benchmarks.

02

L-DRO achieves better trade-offs between average and worst-case accuracy.

03

L-DRO has minimal computational overhead compared to baselines.

Abstract

When training and evaluating machine learning models on a large number of tasks, it is important to not only look at average task accuracy -- which may be biased by easy or redundant tasks -- but also worst-case accuracy (i.e. the performance on the task with the lowest accuracy). In this work, we show how to use techniques from the distributionally robust optimization (DRO) literature to improve worst-case performance in multitask learning. We highlight several failure cases of DRO when applied off-the-shelf and present an improved method, Lookahead-DRO (L-DRO), which mitigates these issues. The core idea of L-DRO is to anticipate the interaction between tasks during training in order to choose a dynamic re-weighting of the various task losses, which will (i) lead to minimal worst-case loss and (ii) train on as many tasks as possible. After demonstrating the efficacy of L-DRO on a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and Algorithms · Multimodal Machine Learning Applications