Selective Task Group Updates for Multi-Task Optimization
Wooseong Jeong, Kuk-Jin Yoon

TL;DR
This paper introduces a novel multi-task learning optimization method that adaptively groups tasks and updates them sequentially, improving performance by better learning task-specific parameters and addressing negative transfer.
Contribution
It proposes an adaptive task grouping algorithm and proximal inter-task affinity to enhance multi-task learning, overcoming limitations of previous gradient-based methods.
Findings
Outperforms previous multi-task optimization methods.
Scalable to different architectures and task numbers.
Significantly improves task-specific parameter learning.
Abstract
Multi-task learning enables the acquisition of task-generic knowledge by training multiple tasks within a unified architecture. However, training all tasks together in a single architecture can lead to performance degradation, known as negative transfer, which is a main concern in multi-task learning. Previous works have addressed this issue by optimizing the multi-task network through gradient manipulation or weighted loss adjustments. However, their optimization strategy focuses on addressing task imbalance in shared parameters, neglecting the learning of task-specific parameters. As a result, they show limitations in mitigating negative transfer, since the learning of shared space and task-specific information influences each other during optimization. To address this, we propose a different approach to enhance multi-task performance by selectively grouping tasks and updating them…
Peer Reviews
Decision·ICLR 2025 Poster
1. Solid analysis from a theoretical perspective: The paper provides theoretical insights to explain the effectiveness of the proposed method, including (i) the benefits of sequential updating of groups, and (ii) the role of incorporating task-specific parameters in reducing conflicts. 2. The paper is well-written and well-organized.
1. Based on proximal inter-task affinity, what principle do we use for task grouping? Discussion on other principles should be included. For example, in [1], they use the Fisher Information Matrix, grouping the most heterogeneous tasks to mitigate conflicts. 2. The motivation for introducing proximal inter-task affinity: After reading Appendix A.1, I still find it difficult to understand the motivation for introducing proximal inter-task affinity. 3. Sequential learning on tasks [1], domains [
1. Experimental Analysis In the field of deep learning, the analysis of batch sequences has not been extensively explored. The author argues that grouping certain objectives in multi-objective problems can be significantly beneficial from a global perspective and has demonstrated this experimentally. In cases where the multi-task learning (MTL) results outperform those of single-task learning (STL), the author’s method consistently achieves the highest performance, which serves as strong empiric
In my understanding, some questions remain regarding the actual utility of certain theoretical approaches. The author addresses the utility of multiple objectives in a local context, but optimization in the field of deep learning is far more complex. In practice, grouping the same classes together for optimization in classification tasks may be optimal for the currently updated classes locally; however, it is challenging to reach a global optimum. I would like to see additional experimental eval
1. This paper investigates an important problem of multi-task learning. 2. This paper is well-written and easy to follow. 3. Realizing that traditional solutions focus on optimizing shared parameters but neglect task-specific ones, the authors delve into the concept of proximal inter-task affinity, making this paper well-motivated. 4. The proposed method is new to me and gives a fresh perspective to further improve the performance of MTL. 5. This approach is said to improve multi-task perform
1. The task grouping result in Figure 3c seems out of converge. Will the number of groups further increase as the iteration becomes larger? 2. Why Nash-MTL is not reported in Table 2? 3. In the theoretical analysis (Section 4), the authors explain how this sequential update strategy can improve multi-task performance from an optimization standpoint. What about the generalization standpoint? I think the generalization of a model is more important. 4. In real-world applications, a typical MTL prob
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Cloud Computing and Resource Management · Optimization and Search Problems
