Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMs
Aldo Pareja, Nikhil Shivakumar Nayak, Hao Wang, Krishnateja, Killamsetty, Shivchander Sudalairaj, Wenlong Zhao, Seungwook Han, Abhishek, Bhandwaldar, Guangxuan Xu, Kai Xu, Ligong Han, Luke Inglis, Akash Srivastava

TL;DR
This paper provides a comprehensive guide for fine-tuning small LLMs (3B-7B parameters) using instruction datasets, challenging common practices, and offering practical insights for cost-effective and efficient model training.
Contribution
It systematically explores training configurations for small LLMs, revealing effective strategies and debunking some existing training recommendations, thus aiding practitioners in accessible model fine-tuning.
Findings
Larger batch sizes with lower learning rates improve performance.
Early training indicators can predict final model quality.
Certain hyperparameter choices do not significantly affect performance.
Abstract
The rise of large language models (LLMs) has created a significant disparity: industrial research labs with their computational resources, expert teams, and advanced infrastructures, can effectively fine-tune LLMs, while individual developers and small organizations face barriers due to limited resources. In this paper, we aim to bridge this gap by presenting a comprehensive study on supervised fine-tuning of LLMs using instruction-tuning datasets spanning diverse knowledge domains and skills. We focus on small-sized LLMs (3B to 7B parameters) for their cost-efficiency and accessibility. We explore various training configurations and strategies across four open-source pre-trained models. We provide detailed documentation of these configurations, revealing findings that challenge several common training practices, including hyperparameter recommendations from TULU and phased training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Rights Management and Security
MethodsFocus
