Towards Efficient Pareto Set Approximation via Mixture of Experts Based   Model Fusion

Anke Tang; Li Shen; Yong Luo; Shiwei Liu; Han Hu; Bo Du

arXiv:2406.09770·cs.LG·June 17, 2024

Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion

Anke Tang, Li Shen, Yong Luo, Shiwei Liu, Han Hu, Bo Du

PDF

Open Access 1 Repo

TL;DR

This paper introduces a scalable mixture of experts approach for efficiently approximating the Pareto front in multi-objective optimization of large neural networks, enabling effective trade-off analysis with minimal additional computational cost.

Contribution

The paper proposes a novel MoE-based model fusion method that captures objective trade-offs and approximates the Pareto set efficiently for large models, outperforming existing methods in scalability and resource usage.

Findings

01

Effectively approximates Pareto front of large models

02

Uses only hundreds of parameters for routers, reducing memory

03

Scales to multiple objectives and large model sizes

Abstract

Solving multi-objective optimization problems for large deep neural networks is a challenging task due to the complexity of the loss landscape and the expensive computational cost of training and evaluating models. Efficient Pareto front approximation of large models enables multi-objective optimization for various tasks such as multi-task learning and trade-off analysis. Existing algorithms for learning Pareto set, including (1) evolutionary, hypernetworks, and hypervolume-maximization methods, are computationally expensive and have restricted scalability to large models; (2) Scalarization algorithms, where a separate model is trained for each objective ray, which is inefficient for learning the entire Pareto set and fails to capture the objective trade-offs effectively. Inspired by the recent success of model merging, we propose a practical and scalable approach to Pareto set learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tanganke/pareto_set_learning
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReservoir Engineering and Simulation Methods · Neural Networks and Applications · Gaussian Processes and Bayesian Inference

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Sparse Evolutionary Training · Cosine Annealing · Layer Normalization · Byte Pair Encoding · Attention Dropout · Weight Decay · Dropout · Adam