Bohdi: Heterogeneous LLM Fusion with Automatic Data Exploration
Junqi Gao, Zhichang Guo, Dazhi Zhang, Dong Li, Runze Liu, Pengfei Li, Kai Tian, Biqing Qi

TL;DR
Bohdi introduces a synthetic-data-only framework for heterogeneous LLM fusion that automatically explores knowledge domains and adaptively allocates data, significantly improving knowledge integration and capability balance across diverse LLMs.
Contribution
The paper presents Bohdi, a novel framework that uses hierarchical domain exploration and adaptive data sampling via multi-armed bandits for effective LLM fusion without real data.
Findings
Outperforms existing methods on multiple benchmarks
Achieves higher data efficiency and capability balance
Effectively adapts to target LLM performance shifts
Abstract
Heterogeneous Large Language Model (LLM) fusion integrates the strengths of multiple source LLMs with different architectures into a target LLM with low computational overhead. While promising, existing methods suffer from two major limitations: 1) reliance on real data from limited domain for knowledge fusion, preventing the target LLM from fully acquiring knowledge across diverse domains, and 2) fixed data allocation proportions across domains, failing to dynamically adjust according to the target LLM's varying capabilities across domains, leading to a capability imbalance. To overcome these limitations, we propose Bohdi, a synthetic-data-only heterogeneous LLM fusion framework. Through the organization of knowledge domains into a hierarchical tree structure, Bohdi enables automatic domain exploration and multi-domain data generation through multi-model collaboration, thereby…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · Artificial Intelligence in Healthcare and Education
