Training Dynamics of a 1.7B LLaMa Model: A Data-Efficient Approach

Miles Q. Li; Benjamin C. M. Fung; Shih-Chia Huang

arXiv:2412.13335·cs.CL·April 8, 2025

Training Dynamics of a 1.7B LLaMa Model: A Data-Efficient Approach

Miles Q. Li, Benjamin C. M. Fung, Shih-Chia Huang

PDF

Open Access 1 Repo 10 Models

TL;DR

This paper details the training process, challenges, and results of developing a 1.7-billion-parameter LLaMa-based language model with a focus on data efficiency, practical training considerations, and benchmark performance.

Contribution

It provides a comprehensive account of training a large language model with a focus on data quality, training logistics, and performance benchmarks, including open-source resources.

Findings

01

High-quality data and scaling improve performance with fewer tokens

02

Restoring optimizer states is crucial for training continuity

03

Hardware changes impact training stability and throughput

Abstract

Pretraining large language models is a complex endeavor influenced by multiple factors, including model architecture, data quality, training continuity, and hardware constraints. In this paper, we share insights gained from the experience of training DMaS-LLaMa-Lite, a fully open source, 1.7-billion-parameter, LLaMa-based model, on approximately 20 billion tokens of carefully curated data. We chronicle the full training trajectory, documenting how evolving validation loss levels and downstream benchmarks reflect transitions from incoherent text to fluent, contextually grounded output. Beyond pretraining, we extend our analysis to include a post-training phase focused on instruction tuning, where the model was refined to produce more contextually appropriate, user-aligned responses. We highlight practical considerations such as the importance of restoring optimizer states when resuming…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mcgill-dmas/dmas-llama-lite-training-code
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFault Detection and Control Systems