From Atoms to Chains: Divergence-Guided Reasoning Curriculum for Unlabeled LLM Domain Adaptation

Yongqi Wang; Xiaofeng Ji; Jie Wang; Qingbin Li; Xiao Xiong; Zheming Yang; Jian Xu; Minghui Qiu; Xinxiao Wu

arXiv:2601.19588·cs.LG·January 28, 2026

From Atoms to Chains: Divergence-Guided Reasoning Curriculum for Unlabeled LLM Domain Adaptation

Yongqi Wang, Xiaofeng Ji, Jie Wang, Qingbin Li, Xiao Xiong, Zheming Yang, Jian Xu, Minghui Qiu, Xinxiao Wu

PDF

Open Access 3 Reviews

TL;DR

This paper introduces Divergence-Guided Reasoning Curriculum (DGRC), a novel method for domain adaptation of LLMs that leverages disagreements between teacher and student to generate atomic question-answer pairs, improving reasoning accuracy without labeled data.

Contribution

The paper proposes a divergence-guided curriculum that constructs atomic and reasoning chain curricula from model disagreements, enhancing unlabeled domain adaptation of LLMs.

Findings

01

Achieves 7.76% relative improvement on 1.5B medical domain model

02

Effectively constructs atomic and chain curricula from reasoning disagreements

03

Demonstrates success across medical and legal domains

Abstract

Adapting Large Language Models (LLMs) to specialized domains without human-annotated data is a crucial yet formidable challenge. Widely adopted knowledge distillation methods often devolve into coarse-grained mimicry, where the student model inefficiently targets its own weaknesses and risks inheriting the teacher's reasoning flaws. This exposes a critical pedagogical dilemma: how to devise a reliable curriculum when the teacher itself is not an infallible expert. Our work resolves this by capitalizing on a key insight: while LLMs may exhibit fallibility in complex, holistic reasoning, they often exhibit high fidelity on focused, atomic sub-problems. Based on this, we propose Divergence-Guided Reasoning Curriculum (DGRC), which constructs a learning path from atomic knowledge to reasoning chains by dynamically deriving two complementary curricula from disagreements in reasoning…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 3

Strengths

Originality: The formulation of the "cognitive asymmetry" principle and its operationalization through a dynamic, disagreement-driven curriculum is a highly original and insightful contribution. The shift from passive mimicry to active diagnosis and repair is a powerful conceptual advance. Quality: The work is technically sound and executed with high quality. The experimental validation is comprehensive, spanning multiple domains, model sizes, and configurations (including self-teaching and RL)

Weaknesses

Incomplete Comparative Analysis: The most significant weakness is the lack of comparison with other modern unlabeled adaptation techniques. For instance, how does DGRC compare to self-training methods like Self-Rewarding Tuning or rejection sampling fine-tuning? Without these comparisons, it is challenging to gauge the true standing of the proposed method within the existing landscape. Limited Scale of Human Evaluation: The manual assessment of the atomic curriculum's quality, while positive, i

Reviewer 02Rating 4Confidence 2

Strengths

First, the core insight of "cognitive asymmetry" is well-validated and innovative, effectively resolving the pedagogical dilemma of learning from an imperfect teacher by shifting focus from flawed holistic reasoning to reliable atomic knowledge. Second, the three-stage DGRC framework (divergence detection, curriculum generation, student adaptation) is structurally rigorous, with multi-step filtering (e.g., IFD and LLM-based evaluation) ensuring high-quality curricula and addressing limitations

Weaknesses

First, DGRC heavily depends on teacher model capability, as shown by the significant performance gap when using strong proprietary teachers (e.g., GPT-4.1) versus weaker open-source ones, limiting its applicability in scenarios where access to advanced teachers is constrained. Second, the framework fails to address "shared blind spots" where both teacher and student make the same error, as divergence (the trigger for curriculum generation) is absent, leaving such critical flaws undetected. T

Reviewer 03Rating 4Confidence 4

Strengths

1. The studied problem of unlabeled domain adaptation is interesting and of practical use. 2. The proposed method follows a cognitive asymmetry principle that is reasonable. 3. The experiments are extensive and persuasive to show the model effectiveness.

Weaknesses

1. The proposed method lacks novelty. 2. Some important experiments are missing. 3. Some details are missing, which decreases the readability of the paper.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text Readability and Simplification · Artificial Intelligence in Healthcare and Education