Training Dynamics of a 1.7B LLaMa Model: A Data-Efficient Approach
Miles Q. Li, Benjamin C. M. Fung, Shih-Chia Huang

TL;DR
This paper details the training process, challenges, and results of developing a 1.7-billion-parameter LLaMa-based language model with a focus on data efficiency, practical training considerations, and benchmark performance.
Contribution
It provides a comprehensive account of training a large language model with a focus on data quality, training logistics, and performance benchmarks, including open-source resources.
Findings
High-quality data and scaling improve performance with fewer tokens
Restoring optimizer states is crucial for training continuity
Hardware changes impact training stability and throughput
Abstract
Pretraining large language models is a complex endeavor influenced by multiple factors, including model architecture, data quality, training continuity, and hardware constraints. In this paper, we share insights gained from the experience of training DMaS-LLaMa-Lite, a fully open source, 1.7-billion-parameter, LLaMa-based model, on approximately 20 billion tokens of carefully curated data. We chronicle the full training trajectory, documenting how evolving validation loss levels and downstream benchmarks reflect transitions from incoherent text to fluent, contextually grounded output. Beyond pretraining, we extend our analysis to include a post-training phase focused on instruction tuning, where the model was refined to produce more contextually appropriate, user-aligned responses. We highlight practical considerations such as the importance of restoring optimizer states when resuming…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗McGill-DMaS/DMaS-LLaMa-Lite-step-2.7kmodel
- 🤗McGill-DMaS/DMaS-LLaMa-Lite-step-43.5k-instructmodel· 1 dl1 dl
- 🤗McGill-DMaS/DMaS-LLaMa-Lite-step-3.3kmodel
- 🤗McGill-DMaS/DMaS-LLaMa-Lite-step-5.1kmodel
- 🤗McGill-DMaS/DMaS-LLaMa-Lite-step-15.7kmodel
- 🤗McGill-DMaS/DMaS-LLaMa-Lite-step-35kmodel
- 🤗McGill-DMaS/DMaS-LLaMa-Lite-step-43.5kmodel· 2 dl2 dl
- 🤗McGill-DMaS/DMaS-LLaMa-Lite-step-7.5kmodel
- 🤗McGill-DMaS/DMaS-LLaMa-Lite-step-20kmodel
- 🤗McGill-DMaS/DMaS-LLaMa-Lite-step-23.9kmodel· 1 dl1 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems
