Strategic Data Ordering: Enhancing Large Language Model Performance   through Curriculum Learning

Jisu Kim; Juhwan Lee

arXiv:2405.07490·cs.CL·May 14, 2024

Strategic Data Ordering: Enhancing Large Language Model Performance through Curriculum Learning

Jisu Kim, Juhwan Lee

PDF

Open Access

TL;DR

This paper introduces a curriculum learning-based data ordering strategy for training large language models, which improves performance by structuring data from simple to complex based on specific criteria, without increasing model size.

Contribution

It proposes a novel data-centric training approach inspired by curriculum learning, using attention and other metrics to order training data for better LLM performance.

Findings

01

Curriculum learning slightly outperforms random data shuffling.

02

Sorting data by attention scores improves model performance.

03

The method enhances LLM training efficiency without increasing resources.

Abstract

The rapid advancement of Large Language Models (LLMs) has improved text understanding and generation but poses challenges in computational resources. This study proposes a curriculum learning-inspired, data-centric training strategy that begins with simpler tasks and progresses to more complex ones, using criteria such as prompt length, attention scores, and loss values to structure the training data. Experiments with Mistral-7B (Jiang et al., 2023) and Gemma-7B (Team et al., 2024) models demonstrate that curriculum learning slightly improves performance compared to traditional random data shuffling. Notably, we observed that sorting data based on our proposed attention criteria generally led to better performance. This approach offers a sustainable method to enhance LLM performance without increasing model size or dataset volume, addressing scalability challenges in LLM training.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques