Jacobian Descent for Multi-Objective Optimization
Pierre Quinton, Val\'erian Rey

TL;DR
This paper introduces Jacobian descent, a novel gradient-based optimization method for multi-objective problems that effectively resolves conflicts between objectives and guarantees stronger convergence, with applications to instance-wise risk minimization.
Contribution
The paper proposes Jacobian descent, a new algorithm for multi-objective optimization that projects gradients to resolve conflicts and provides improved convergence guarantees.
Findings
Jacobian descent outperforms existing methods in conflicting objectives scenarios.
The approach enables effective instance-wise risk minimization in machine learning.
Empirical results show promising performance on image classification tasks.
Abstract
Many optimization problems require balancing multiple conflicting objectives. As gradient descent is limited to single-objective optimization, we introduce its direct generalization: Jacobian descent (JD). This algorithm iteratively updates parameters using the Jacobian matrix of a vector-valued objective function, in which each row is the gradient of an individual objective. While several methods to combine gradients already exist in the literature, they are generally hindered when the objectives conflict. In contrast, we propose projecting gradients to fully resolve conflict while ensuring that they preserve an influence proportional to their norm. We prove significantly stronger convergence guarantees with this approach, supported by our empirical results. Our method also enables instance-wise risk minimization (IWRM), a novel learning paradigm in which the loss of each training…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
* The proposed JD framework generalizes many existing MOO approaches, providing insights into the similarities, differences, and behaviors of these approaches. * The unconflicting projection of gradients aggregator, UPGrad, has provable convergence guarantees and demonstrates faster convergence emperically in practical applications compared to existing MOO methods. * The paper is fairly easy to follow, the proposed methods are well motivated, and the build-up to the proposed methods are clearly
* The theoretical convergence guarantees are only asymptotic, so it is unclear how JD compares with standard gradient descent in terms of theoretical convergence rate. * As the authors acknowledge, the proposed method is computationally intensive. It is also unclear, even with the outlined method for increasing computational efficiency, whether the computational overhead can be reduced (compared to, for example, standard SGD on the ERM objective) due to the need to compute the Jacobian. For exa
1. Theory: The paper provides a solid theoretical foundation, including convergence proofs for JD with AUPGrad under certain conditions, which adds rigor to the proposed method. 2. Reproduction: The source codes are provided, which is helpful to validate the realistic performance. 3. Detailed introductions to baselines. Appendix B states the properties and computation steps of several compared algorithms, which helps to highlight and better understand the advantages of this work.
1. Lack of mathematical backgrounds. It’s kindly suggested to add more detailed introductions to the investigated multi-objective learning and definitions on Pareto fronts. 2. Theory. The theoretical guarantees provided for JD with AUPGrad rely on assumptions of smoothness and convexity, which may not hold in some practical scenarios, limiting the applicability of the results. More practical examples are required. Could you please further derive the convergence rate? Some computation complexity
Innovative Contribution: The JD method presents a clear and novel perspective to gradient descent for multi-objective optimization, an area of significant relevance to fields like multi-task learning and adversarial training. Applicability to Emerging Areas: By addressing applications such as federated learning and distributed optimization, the paper suggests impactful future directions for JD and AUPGrad beyond standard optimization problems.
1. Not Very Well-Structured: The organization of the content in this article is inconsiderate. For example, in the beginning of Section 2, the definition of the relationship between vectors u and v is actually related to the Pareto optimal condition. It would have been more appropriate to clearly introduce the definitions related to Pareto before presenting the definitions used in this paper. However, the paper provides the Pareto optimal condition for the first time in Section 2.4. If the rea
A new algorithm has been proposed to reduce complexity using the Gramian of the Jacobian matrix
1. First of all, the writing of this paper is disastrous due to the lack of necessary introduction to multi-objective algorithms. For those unfamiliar with multi-objective algorithms, especially Pareto optimality, it is really difficult to understand this paper. The Pareto frontier, as an important concept in this paper, is not introduced at all. Additionally, there is a significant lack of explanation for various details. For example, in Figure 1, we are dealing with a multi-objective optimizat
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Multi-Objective Optimization Algorithms · Advanced Control Systems Optimization · Advanced Optimization Algorithms Research
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
