Skrull: Towards Efficient Long Context Fine-tuning through Dynamic Data Scheduling

Hongtao Xu; Wenting Shen; Yuanxin Wei; Ang Wang; Guo Runfan; Tianxing Wang; Yong Li; Mingzhen Li; Weile Jia

arXiv:2505.19609·cs.LG·December 16, 2025

Skrull: Towards Efficient Long Context Fine-tuning through Dynamic Data Scheduling

Hongtao Xu, Wenting Shen, Yuanxin Wei, Ang Wang, Guo Runfan, Tianxing Wang, Yong Li, Mingzhen Li, Weile Jia

PDF

1 Video

TL;DR

Skrull introduces a dynamic data scheduling method that significantly improves training efficiency for long-context fine-tuning of large language models by balancing computation across heterogeneous sequence lengths.

Contribution

This paper presents Skrull, a novel lightweight data scheduler designed to optimize long-context fine-tuning by addressing data heterogeneity challenges in LLM training.

Findings

01

Skrull outperforms DeepSpeed by up to 7.54x in training efficiency.

02

The scheduling algorithm achieves near-zero online scheduling cost.

03

Experimental results validate the effectiveness of Skrull in real-world scenarios.

Abstract

Long-context supervised fine-tuning (Long-SFT) plays a vital role in enhancing the performance of large language models (LLMs) on long-context tasks. To smoothly adapt LLMs to long-context scenarios, this process typically entails training on mixed datasets containing both long and short sequences. However, this heterogeneous sequence length distribution poses significant challenges for existing training systems, as they fail to simultaneously achieve high training efficiency for both long and short sequences, resulting in sub-optimal end-to-end system performance in Long-SFT. In this paper, we present a novel perspective on data scheduling to address the challenges posed by the heterogeneous data distributions in Long-SFT. We propose Skrull, a dynamic data scheduler specifically designed for efficient long-SFT. Through dynamic data scheduling, Skrull balances the computation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Skrull: Towards Efficient Long Context Fine-tuning through Dynamic Data Scheduling· slideslive