Dual-Balancing for Multi-Task Learning

Baijiong Lin; Weisen Jiang; Feiyang Ye; Yu Zhang; Pengguang Chen; Ying-Cong Chen; Shu Liu; Ivor W. Tsang; James T. Kwok

arXiv:2308.12029·cs.LG·November 27, 2025·5 cites

Dual-Balancing for Multi-Task Learning

Baijiong Lin, Weisen Jiang, Feiyang Ye, Yu Zhang, Pengguang Chen, Ying-Cong Chen, Shu Liu, Ivor W. Tsang, James T. Kwok

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces Dual-Balancing Multi-Task Learning (DB-MTL), a novel approach that balances tasks by adjusting loss scales and gradient magnitudes, leading to improved performance across benchmarks.

Contribution

The paper proposes a dual-balancing method for multi-task learning that simultaneously addresses loss and gradient disparities, a novel approach in the field.

Findings

01

DB-MTL outperforms existing methods on benchmark datasets.

02

Loss-scale balancing via logarithm transformation improves task harmony.

03

Gradient normalization enhances training stability and performance.

Abstract

Multi-task learning aims to learn multiple related tasks simultaneously and has achieved great success in various fields. However, the disparity in loss and gradient scales among tasks often leads to performance compromises, and the balancing of tasks remains a significant challenge. In this paper, we propose Dual-Balancing Multi-Task Learning (DB-MTL) to achieve task balancing from both the loss and gradient perspectives. Specifically, DB-MTL achieves loss-scale balancing by performing logarithm transformation on each task loss, and rescales gradient magnitudes by normalizing all task gradients to comparable magnitudes using the maximum gradient norm. Extensive experiments on a number of benchmark datasets demonstrate that DB-MTL consistently performs better than the current state-of-the-art.

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

Quality: The authors execute extensive experiments which support their conclusions. The postulations and the experimental results correlate well, strengthening the paper’s credibility. Clarity: The paper is well-written and very easy to understand. Originality: The maximum-norm strategy in Section 3.2 has some novelty.

Weaknesses

There is a significant lack of novelty in the presented techniques. The first part discusses scale-balancing loss transformation, which utilizes the common method of applying a log transformation to loss. This technique has already been mentioned by Nash-MTL and therefore doesn't contribute anything substantially new to the field. The second section of the method, gradient normalization, is a modification of the GradNorm technique. The presented maximum-norm strategy essentially ensures all smal

Reviewer 02Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

1. **Simplicity of the Method:** The loss-scale balancing and the gradient-magnitude balancing of the approach are both commendably straightforward. 2. **Sufficient and Effective Experiments:** The paper demonstrates the effectiveness of DB-MTL through extensive validation across three distinct scenarios and five datasets. The proposed method is optimal on all datasets.

Weaknesses

1. **Imprecise Overview of the MTL Objective:** The MTL objective (Equation 1) in Section 2 is imprecise. It is applicable primarily to existing loss balancing and certain gradient balancing methods. For example, in some gradient balancing methods like PCGrad and CAGrad, the weights of task-specific parameters are all 1, and in all hybrid balancing methods, the weights of task-specific and task-shared parameters are different. 2. **Omission of Relevant Literature:** (1) The DB-MTL method, albe

Reviewer 03Rating 3· reject, not good enoughConfidence 4

Strengths

- The method is well-motivated. Experiments on various datasets demonstrate state-of-the-art performance. Ablations validate the efficacy of each component. - The dual-balancing framework is reasonable. The loss-scale balancing via logarithmic transformation helps existing gradient methods. Normalizing gradients by the maximum norm is simple yet effective. - The problem setup and methods are clearly explained.

Weaknesses

- The novelty of this paper is limited. Loss and gradient balance have been extensively explored, and this paper does not have any new technical insights. It is only a combination of the existing works. - The theoretical analysis is not sufficient to suppor the advantage of the proposed method. Proving convergence guarantees or other theoretical properties of DB-MTL could reveal more advantages over existing methods. - Detailed experimental analyzing on why DB-MTL outperforms certain baselines i

Code & Models

Repositories

median-research-group/libmtl
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI · Machine Learning and ELM