Dual-Balancing for Multi-Task Learning
Baijiong Lin, Weisen Jiang, Feiyang Ye, Yu Zhang, Pengguang Chen, Ying-Cong Chen, Shu Liu, Ivor W. Tsang, James T. Kwok

TL;DR
This paper introduces Dual-Balancing Multi-Task Learning (DB-MTL), a novel approach that balances tasks by adjusting loss scales and gradient magnitudes, leading to improved performance across benchmarks.
Contribution
The paper proposes a dual-balancing method for multi-task learning that simultaneously addresses loss and gradient disparities, a novel approach in the field.
Findings
DB-MTL outperforms existing methods on benchmark datasets.
Loss-scale balancing via logarithm transformation improves task harmony.
Gradient normalization enhances training stability and performance.
Abstract
Multi-task learning aims to learn multiple related tasks simultaneously and has achieved great success in various fields. However, the disparity in loss and gradient scales among tasks often leads to performance compromises, and the balancing of tasks remains a significant challenge. In this paper, we propose Dual-Balancing Multi-Task Learning (DB-MTL) to achieve task balancing from both the loss and gradient perspectives. Specifically, DB-MTL achieves loss-scale balancing by performing logarithm transformation on each task loss, and rescales gradient magnitudes by normalizing all task gradients to comparable magnitudes using the maximum gradient norm. Extensive experiments on a number of benchmark datasets demonstrate that DB-MTL consistently performs better than the current state-of-the-art.
Peer Reviews
Decision·Submitted to ICLR 2024
Quality: The authors execute extensive experiments which support their conclusions. The postulations and the experimental results correlate well, strengthening the paper’s credibility. Clarity: The paper is well-written and very easy to understand. Originality: The maximum-norm strategy in Section 3.2 has some novelty.
There is a significant lack of novelty in the presented techniques. The first part discusses scale-balancing loss transformation, which utilizes the common method of applying a log transformation to loss. This technique has already been mentioned by Nash-MTL and therefore doesn't contribute anything substantially new to the field. The second section of the method, gradient normalization, is a modification of the GradNorm technique. The presented maximum-norm strategy essentially ensures all smal
1. **Simplicity of the Method:** The loss-scale balancing and the gradient-magnitude balancing of the approach are both commendably straightforward. 2. **Sufficient and Effective Experiments:** The paper demonstrates the effectiveness of DB-MTL through extensive validation across three distinct scenarios and five datasets. The proposed method is optimal on all datasets.
1. **Imprecise Overview of the MTL Objective:** The MTL objective (Equation 1) in Section 2 is imprecise. It is applicable primarily to existing loss balancing and certain gradient balancing methods. For example, in some gradient balancing methods like PCGrad and CAGrad, the weights of task-specific parameters are all 1, and in all hybrid balancing methods, the weights of task-specific and task-shared parameters are different. 2. **Omission of Relevant Literature:** (1) The DB-MTL method, albe
- The method is well-motivated. Experiments on various datasets demonstrate state-of-the-art performance. Ablations validate the efficacy of each component. - The dual-balancing framework is reasonable. The loss-scale balancing via logarithmic transformation helps existing gradient methods. Normalizing gradients by the maximum norm is simple yet effective. - The problem setup and methods are clearly explained.
- The novelty of this paper is limited. Loss and gradient balance have been extensively explored, and this paper does not have any new technical insights. It is only a combination of the existing works. - The theoretical analysis is not sufficient to suppor the advantage of the proposed method. Proving convergence guarantees or other theoretical properties of DB-MTL could reveal more advantages over existing methods. - Detailed experimental analyzing on why DB-MTL outperforms certain baselines i
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI · Machine Learning and ELM
