MoL for LLMs: Dual-Loss Optimization to Enhance Domain Expertise While Preserving General Capabilities

Jingxue Chen; Qingkun Tang; Qianchun Lu; Siyuan Fang

arXiv:2505.12043·cs.CL·May 21, 2025

MoL for LLMs: Dual-Loss Optimization to Enhance Domain Expertise While Preserving General Capabilities

Jingxue Chen, Qingkun Tang, Qianchun Lu, Siyuan Fang

PDF

Open Access

TL;DR

This paper introduces MoL, a dual-loss optimization framework for LLMs that enhances domain expertise while maintaining general skills, addressing limitations of continual pre-training with a novel loss decoupling approach.

Contribution

The paper proposes a dual-loss architecture for LLM training that separates domain-specific and general knowledge optimization, improving domain adaptation without sacrificing general capabilities.

Findings

01

Achieves 27.9% higher accuracy on Math-500 benchmark

02

Improves AIME25 performance by 83.3% in think mode

03

Optimal domain-general corpus ratio is 1:1 for balanced training

Abstract

Although large language models (LLMs) perform well in general tasks, domain-specific applications suffer from hallucinations and accuracy limitations. Continual Pre-Training (CPT) approaches encounter two key issues: (1) domain-biased data degrades general language skills, and (2) improper corpus-mixture ratios limit effective adaptation. To address these, we propose a novel framework, Mixture of Losses (MoL), which decouples optimization objectives for domain-specific and general corpora. Specifically, cross-entropy (CE) loss is applied to domain-corpus to ensure knowledge acquisition, while Kullback-Leibler (KL) divergence aligns general-corpus training with the base model's foundational capabilities. This dual-loss architecture preserves universal skills while enhancing domain expertise, avoiding catastrophic forgetting. Empirically, we validate that a 1:1 domain-to-general corpus…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsService-Oriented Architecture and Web Services · Semantic Web and Ontologies · AI-based Problem Solving and Planning

MethodsBalanced Selection