Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion
Anke Tang, Li Shen, Yong Luo, Shiwei Liu, Han Hu, Bo Du

TL;DR
This paper introduces a scalable mixture of experts approach for efficiently approximating the Pareto front in multi-objective optimization of large neural networks, enabling effective trade-off analysis with minimal additional computational cost.
Contribution
The paper proposes a novel MoE-based model fusion method that captures objective trade-offs and approximates the Pareto set efficiently for large models, outperforming existing methods in scalability and resource usage.
Findings
Effectively approximates Pareto front of large models
Uses only hundreds of parameters for routers, reducing memory
Scales to multiple objectives and large model sizes
Abstract
Solving multi-objective optimization problems for large deep neural networks is a challenging task due to the complexity of the loss landscape and the expensive computational cost of training and evaluating models. Efficient Pareto front approximation of large models enables multi-objective optimization for various tasks such as multi-task learning and trade-off analysis. Existing algorithms for learning Pareto set, including (1) evolutionary, hypernetworks, and hypervolume-maximization methods, are computationally expensive and have restricted scalability to large models; (2) Scalarization algorithms, where a separate model is trained for each objective ray, which is inefficient for learning the entire Pareto set and fails to capture the objective trade-offs effectively. Inspired by the recent success of model merging, we propose a practical and scalable approach to Pareto set learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReservoir Engineering and Simulation Methods · Neural Networks and Applications · Gaussian Processes and Bayesian Inference
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Sparse Evolutionary Training · Cosine Annealing · Layer Normalization · Byte Pair Encoding · Attention Dropout · Weight Decay · Dropout · Adam
