EDCO: Dynamic Curriculum Orchestration for Domain-specific Large Language Model Fine-tuning
Jing-Cheng Pang, Liu Sun, Chang Zhou, Xian Tang, Haichuan Ma, Kun Jiang, Jianlong Wang, Kai Zhang, Sijie Wu, Haoran Cai, Chenwei Wu, Xubin Li, Xin Chen

TL;DR
EDCO introduces a dynamic curriculum learning framework for domain-specific LLM fine-tuning that adaptively selects high-entropy samples, improving training efficiency and model performance across multiple domains.
Contribution
This paper presents EDCO, a novel dynamic curriculum orchestration method based on inference entropy, which adapts sample selection during fine-tuning, unlike static curricula in prior work.
Findings
EDCO outperforms traditional static curriculum strategies in multiple domains.
Efficient entropy estimation reduces computational time by 83.5%.
EDCO enhances long-term reasoning in LLMs during fine-tuning.
Abstract
Domain-specific large language models (LLMs), typically developed by fine-tuning a pre-trained general-purpose LLM on specialized datasets, represent a significant advancement in applied AI. A common strategy in LLM fine-tuning is curriculum learning, which pre-orders training samples based on metrics like difficulty to improve learning efficiency compared to a random sampling strategy. However, most existing methods for LLM fine-tuning rely on a static curriculum, designed prior to training, which lacks adaptability to the model's evolving needs during fine-tuning. To address this, we propose EDCO, a novel framework based on two key concepts: inference entropy and dynamic curriculum orchestration. Inspired by recent findings that maintaining high answer entropy benefits long-term reasoning gains, EDCO prioritizes samples with high inference entropy in a continuously adapted curriculum.…
Peer Reviews
Decision·Submitted to ICLR 2026
- Introduces a reverse-curriculum strategy, focusing on hard, high-entropy samples, to efficiently specialize pretrained LLMs, countering “easy-to-hard” convention. It is also motivated well by the entropy collapse phenomena in RL-trained LLMs, aligning with recent works on exploration-preserving training. - Prefix-Entropy Approximation is a practical innovation enabling dynamic curricula at scale. - Strong empirical improvements. - Overall the problem is interesting and addresses a pressing
- The method is tested only on telecom & wireless engineering tasks. I would like to see evaluation results on a qualitatively different domain to strengthen generality claims, like medical reasoning/legal tasks/math/code etc. - There is a potential Over-focus on High-Entropy Outliers. This method selects top-entropy samples but high entropy may arise from nonsensical edge cases or OOD errors, and could lead to overfitting to pathological difficulty zones. Maybe the authors can test threshold-b
Strengths: 1. The issue that this paper addressed is a well-known issue of entropy collapse in RL-based fine-tuning. 2. This paper provides clear details about the implementation. For example, the integration of prefix-based entropy approximation and quick-answer prompting is both innovative and practical, yielding large computational. 3. The experiments are carefully designed, demonstrating consistent improvements across supervised and RL settings. The dynamic of entropy and number of new
To me, there are two main limitations: First of all, to me the method is both heuristic and incremental relative to existing curriculum and entropy-based sampling approaches. Given this, comprehensive experiments are usually necessary. However, the experiments are narrow in scope, limited to Qwen models and communication domains, which makes it unclear if the method generalizes beyond this setting. If would be appreciate that if the authors could provide experiments using other models, and ex
1. Firstly, the motivation of trying to find a dynamic approach that works in practice is valuable. Moreover, the paper introduces a good approach by combining a quick answer prompting step with prefix only entropy, effectively reducing the computational load of selection during training phases. This approach addresses a limitation in dynamic curricula, enabling more frequent re ranking and thereby enhancing their practical applicability in standard training pipelines, both in supervised fine tu
1. The main contributions are engineering improvements using known parts like uncertainty-based selection, quick-answer prompting, and prefix truncation rather than a new objective or learning principle. 2. Results are limited to two communication domains with synthetic datasets, with 12k high-quality train samples and only 230 test samples per domain. The evidence for generalization to other domains and standard benchmarks is missing. 3. Most evaluations compare against static curricula rather
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
