Soft Merging: A Flexible and Robust Soft Model Merging Approach for Enhanced Neural Network Performance
Hao Chen, Yusen Wu, Phuong Nguyen, Chao Liu, Yelena Yesha

TL;DR
This paper introduces a soft merging technique for neural networks that efficiently combines multiple models, improves performance, and enhances robustness against malicious inputs by learning gate parameters without altering original weights.
Contribution
The proposed soft merging method enables rapid, robust, and cost-effective merging of neural network models through learned gating, avoiding weight modifications and improving convergence.
Findings
Merging improves model performance and convergence.
Method is robust against malicious models.
Experiments show superior performance of merged networks.
Abstract
Stochastic Gradient Descent (SGD), a widely used optimization algorithm in deep learning, is often limited to converging to local optima due to the non-convex nature of the problem. Leveraging these local optima to improve model performance remains a challenging task. Given the inherent complexity of neural networks, the simple arithmetic averaging of the obtained local optima models in undesirable results. This paper proposes a {\em soft merging} method that facilitates rapid merging of multiple models, simplifies the merging of specific parts of neural networks, and enhances robustness against malicious models with extreme values. This is achieved by learning gate parameters through a surrogate of the norm using hard concrete distribution without modifying the model weights of the given local optima models. This merging process not only enhances the model performance by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Neural Networks and Applications
