Gradient Similarity Surgery in Multi-Task Deep Learning

Thomas Borsani; Andrea Rosani; Giuseppe Nicosia; Giuseppe Di Fatta

arXiv:2506.06130·cs.LG·June 9, 2025

Gradient Similarity Surgery in Multi-Task Deep Learning

Thomas Borsani, Andrea Rosani, Giuseppe Nicosia, Giuseppe Di Fatta

PDF

Open Access 1 Models

TL;DR

This paper introduces SAM-GS, a novel gradient surgery method for multi-task deep learning that uses gradient similarity to improve training stability and convergence by addressing conflicting gradients.

Contribution

The paper proposes SAM-GS, a scalable gradient surgery technique based on gradient magnitude similarity, enhancing multi-task learning optimization.

Findings

01

SAM-GS improves convergence speed in multi-task learning.

02

Gradient similarity regularizes gradient aggregation effectively.

03

Experimental results show SAM-GS outperforms existing methods.

Abstract

The multi-task learning ( $M T L$ ) paradigm aims to simultaneously learn multiple tasks within a single model capturing higher-level, more general hidden patterns that are shared by the tasks. In deep learning, a significant challenge in the backpropagation training process is the design of advanced optimisers to improve the convergence speed and stability of the gradient descent learning rule. In particular, in multi-task deep learning ( $M T D L$ ) the multitude of tasks may generate potentially conflicting gradients that would hinder the concurrent convergence of the diverse loss functions. This challenge arises when the gradients of the task objectives have either different magnitudes or opposite directions, causing one or a few to dominate or to interfere with each other, thus degrading the training process. Gradient surgery methods address the problem explicitly dealing with conflicting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
gustavlangstroem/Microexpert_NG
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications