MoL for LLMs: Dual-Loss Optimization to Enhance Domain Expertise While Preserving General Capabilities
Jingxue Chen, Qingkun Tang, Qianchun Lu, Siyuan Fang

TL;DR
This paper introduces MoL, a dual-loss optimization framework for LLMs that enhances domain expertise while maintaining general skills, addressing limitations of continual pre-training with a novel loss decoupling approach.
Contribution
The paper proposes a dual-loss architecture for LLM training that separates domain-specific and general knowledge optimization, improving domain adaptation without sacrificing general capabilities.
Findings
Achieves 27.9% higher accuracy on Math-500 benchmark
Improves AIME25 performance by 83.3% in think mode
Optimal domain-general corpus ratio is 1:1 for balanced training
Abstract
Although large language models (LLMs) perform well in general tasks, domain-specific applications suffer from hallucinations and accuracy limitations. Continual Pre-Training (CPT) approaches encounter two key issues: (1) domain-biased data degrades general language skills, and (2) improper corpus-mixture ratios limit effective adaptation. To address these, we propose a novel framework, Mixture of Losses (MoL), which decouples optimization objectives for domain-specific and general corpora. Specifically, cross-entropy (CE) loss is applied to domain-corpus to ensure knowledge acquisition, while Kullback-Leibler (KL) divergence aligns general-corpus training with the base model's foundational capabilities. This dual-loss architecture preserves universal skills while enhancing domain expertise, avoiding catastrophic forgetting. Empirically, we validate that a 1:1 domain-to-general corpus…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsService-Oriented Architecture and Web Services · Semantic Web and Ontologies · AI-based Problem Solving and Planning
MethodsBalanced Selection
