AP-BMM: Approximating Capability-Cost Pareto Sets of LLMs via Asynchronous Prior-Guided Bayesian Model Merging

Kesheng Chen; Yamin Hu; Zhenqian Zhu; Yiya Diao; Wenjian Luo

arXiv:2512.09972·cs.LG·May 14, 2026

AP-BMM: Approximating Capability-Cost Pareto Sets of LLMs via Asynchronous Prior-Guided Bayesian Model Merging

Kesheng Chen, Yamin Hu, Zhenqian Zhu, Yiya Diao, Wenjian Luo

PDF

1 Repo

TL;DR

AP-BMM introduces an asynchronous Bayesian approach guided by model differences to efficiently generate a diverse set of LLMs balancing reasoning ability and inference cost.

Contribution

It presents a novel asynchronous prior-guided Bayesian model merging method that improves Pareto set coverage and GPU utilization in multi-objective LLM model merging.

Findings

01

Achieves stronger Pareto-set quality under fixed evaluation budgets.

02

Broadens trade-off coverage compared to baseline methods.

03

Reduces wall-clock time by better GPU utilization.

Abstract

Serving Large Language Models (LLMs) often requires choosing between stronger reasoning and lower inference cost. Model merging offers a practical way to build several models between a reasoning-oriented model and a cheaper base model, but common model-level merging methods usually control this trade-off with only one or two global knobs. We study this setting as a multi-objective optimization problem: instead of producing one merged model, the goal is to find a set of merged models that cover different accuracy--token-cost preferences. Layer-wise merging is more flexible because it can assign different merge weights to different Transformer layers. However, it introduces two practical challenges. First, the layer-wise search space is large, and existing methods often search it without using helpful signals from the source models. Second, LLM evaluations can take very different amounts…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MiLab-HITSZ/AP-BMM
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.