Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small   LLMs

Aldo Pareja; Nikhil Shivakumar Nayak; Hao Wang; Krishnateja; Killamsetty; Shivchander Sudalairaj; Wenlong Zhao; Seungwook Han; Abhishek; Bhandwaldar; Guangxuan Xu; Kai Xu; Ligong Han; Luke Inglis; Akash Srivastava

arXiv:2412.13337·cs.LG·December 19, 2024·2 cites

Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMs

Aldo Pareja, Nikhil Shivakumar Nayak, Hao Wang, Krishnateja, Killamsetty, Shivchander Sudalairaj, Wenlong Zhao, Seungwook Han, Abhishek, Bhandwaldar, Guangxuan Xu, Kai Xu, Ligong Han, Luke Inglis, Akash Srivastava

PDF

Open Access

TL;DR

This paper provides a comprehensive guide for fine-tuning small LLMs (3B-7B parameters) using instruction datasets, challenging common practices, and offering practical insights for cost-effective and efficient model training.

Contribution

It systematically explores training configurations for small LLMs, revealing effective strategies and debunking some existing training recommendations, thus aiding practitioners in accessible model fine-tuning.

Findings

01

Larger batch sizes with lower learning rates improve performance.

02

Early training indicators can predict final model quality.

03

Certain hyperparameter choices do not significantly affect performance.

Abstract

The rise of large language models (LLMs) has created a significant disparity: industrial research labs with their computational resources, expert teams, and advanced infrastructures, can effectively fine-tune LLMs, while individual developers and small organizations face barriers due to limited resources. In this paper, we aim to bridge this gap by presenting a comprehensive study on supervised fine-tuning of LLMs using instruction-tuning datasets spanning diverse knowledge domains and skills. We focus on small-sized LLMs (3B to 7B parameters) for their cost-efficiency and accessibility. We explore various training configurations and strategies across four open-source pre-trained models. We provide detailed documentation of these configurations, revealing findings that challenge several common training practices, including hyperparameter recommendations from TULU and phased training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Rights Management and Security

MethodsFocus