TL;DR
This paper introduces RCP-Merging, a novel framework that effectively combines long chain-of-thought reasoning models with domain-specific models, preserving reasoning ability while enhancing domain-specific task performance.
Contribution
The paper proposes a new merging method that maintains reasoning capabilities and domain knowledge integration without significant performance degradation.
Findings
Improves domain task performance by over 9%.
Maintains core reasoning capabilities effectively.
Outperforms existing merging methods.
Abstract
Large Language Models (LLMs) with long chain-of-thought (CoT) capability, termed Reasoning Models, demonstrate superior intricate problem-solving abilities through multi-step long CoT reasoning. To create a dual-capability model with long CoT capability and domain-specific knowledge without substantial computational and data costs, model merging emerges as a highly resource-efficient method. However, significant challenges lie in merging domain-specific LLMs with long CoT ones since nowadays merging methods suffer from reasoning capability degradation, even gibberish output and output collapse. To overcome this, we introduce RCP-Merging: Merging Long Chain-of-Thought Models with Domain-Specific Models by Considering Reasoning Capability as Prior, a novel merging framework designed to integrate domain-specific LLMs with long CoT capability, meanwhile maintaining model performance in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
