TL;DR
This paper proposes a novel dynamic multi-language model collaboration method using minimal complete semantic units and a distribution distance-based selection strategy to improve reasoning capabilities and address vocabulary misalignment.
Contribution
It introduces the concept of minimal complete semantic units (MCSU) and a distribution distance-based dynamic selection strategy (DDS) for effective multi-model collaboration.
Findings
Outperforms existing methods on various benchmarks.
Effectively addresses vocabulary misalignment issues.
Enhances reasoning capabilities of language models.
Abstract
This paper investigates the enhancement of reasoning capabilities in language models through token-level multi-model collaboration. Our approach selects the optimal tokens from the next token distributions provided by multiple models to perform autoregressive reasoning. Contrary to the assumption that more models yield better results, we introduce a distribution distance-based dynamic selection strategy (DDS) to optimize the multi-model collaboration process. To address the critical challenge of vocabulary misalignment in multi-model collaboration, we propose the concept of minimal complete semantic units (MCSU), which is simple yet enables multiple language models to achieve natural alignment within the linguistic space. Experimental results across various benchmarks demonstrate the superiority of our method. The code will be available at https://github.com/Fanye12/DDS.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
