Strategic Data Ordering: Enhancing Large Language Model Performance through Curriculum Learning
Jisu Kim, Juhwan Lee

TL;DR
This paper introduces a curriculum learning-based data ordering strategy for training large language models, which improves performance by structuring data from simple to complex based on specific criteria, without increasing model size.
Contribution
It proposes a novel data-centric training approach inspired by curriculum learning, using attention and other metrics to order training data for better LLM performance.
Findings
Curriculum learning slightly outperforms random data shuffling.
Sorting data by attention scores improves model performance.
The method enhances LLM training efficiency without increasing resources.
Abstract
The rapid advancement of Large Language Models (LLMs) has improved text understanding and generation but poses challenges in computational resources. This study proposes a curriculum learning-inspired, data-centric training strategy that begins with simpler tasks and progresses to more complex ones, using criteria such as prompt length, attention scores, and loss values to structure the training data. Experiments with Mistral-7B (Jiang et al., 2023) and Gemma-7B (Team et al., 2024) models demonstrate that curriculum learning slightly improves performance compared to traditional random data shuffling. Notably, we observed that sorting data based on our proposed attention criteria generally led to better performance. This approach offers a sustainable method to enhance LLM performance without increasing model size or dataset volume, addressing scalability challenges in LLM training.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
