Mistral-C2F: Coarse to Fine Actor for Analytical and Reasoning Enhancement in RLHF and Effective-Merged LLMs
Chen Zheng, Ke Sun, Xun Zhou

TL;DR
Mistral-C2F introduces a two-step coarse-to-fine actor approach that enhances small LLMs' analytical and reasoning capabilities through RLHF, continuous maximization, and knowledge residue merging, outperforming larger models.
Contribution
The paper proposes a novel coarse-to-fine actor framework with continuous maximization and knowledge residue merging to significantly improve small LLMs' analytical and conversational performance.
Findings
Outperforms similar-scale models on 11 language tasks.
Excels in MT-Bench Dialogue task.
Enhances reasoning and reduces redundancies in small LLMs.
Abstract
Despite the advances in Large Language Models (LLMs), exemplified by models like GPT-4 and Claude, smaller-scale LLMs such as Llama and Mistral often struggle with generating in-depth and coherent dialogues. This paper presents a novel two-step Coarse-to-Fine Actor model to address the inherent limitations in conversational and analytical capabilities of small-sized LLMs. Our approach begins with the Policy-based Coarse Actor, employing a technique we term "Continuous Maximization". The Coarse Actor establishes an enhanced, knowledge-rich pool adept at aligning with human preference styles in analysis and reasoning. Through the RLHF process, it employs Continuous Maximization, a strategy that dynamically and adaptively extends the output length limit, enabling the generation of more detailed and analytical content. Subsequently, the Fine Actor refines this analytical content, addressing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Fuzzy Logic and Control Systems · Fault Detection and Control Systems
MethodsResidual Connection · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Adam · Attention Is All You Need · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer
