Stay Tuned: An Empirical Study of the Impact of Hyperparameters on LLM   Tuning in Real-World Applications

Alon Halfon; Shai Gretz; Ofir Arviv; Artem Spector; Orith; Toledo-Ronen; Yoav Katz; Liat Ein-Dor; Michal Shmueli-Scheuer; Noam Slonim

arXiv:2407.18990·cs.LG·August 8, 2024·3 cites

Stay Tuned: An Empirical Study of the Impact of Hyperparameters on LLM Tuning in Real-World Applications

Alon Halfon, Shai Gretz, Ofir Arviv, Artem Spector, Orith, Toledo-Ronen, Yoav Katz, Liat Ein-Dor, Michal Shmueli-Scheuer, Noam Slonim

PDF

Open Access

TL;DR

This study empirically investigates how hyperparameter choices affect large language model tuning in real-world scenarios, providing practical recommendations and a coverage-based search method to optimize tuning configurations efficiently.

Contribution

It introduces a coverage-based search process for hyperparameter ranking and offers empirically validated recommendations for tuning Llama-3-8B and Mistral-7B models using full fine-tuning and LoRa.

Findings

01

Llama-3-8B and LoRa are generally preferred for tuning.

02

Few hyperparameter configurations can yield excellent results.

03

Coverage-based search effectively identifies robust hyperparameter settings.

Abstract

Fine-tuning Large Language Models (LLMs) is an effective method to enhance their performance on downstream tasks. However, choosing the appropriate setting of tuning hyperparameters (HPs) is a labor-intensive and computationally expensive process. Here, we provide recommended HP configurations for practical use-cases that represent a better starting point for practitioners, when considering two SOTA LLMs and two commonly used tuning methods. We describe Coverage-based Search (CBS), a process for ranking HP configurations based on an offline extensive grid search, such that the top ranked configurations collectively provide a practical robust recommendation for a wide range of datasets and domains. We focus our experiments on Llama-3-8B and Mistral-7B, as well as full fine-tuning and LoRa, conducting a total of > 10,000 tuning experiments. Our results suggest that, in general, Llama-3-8B…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSimulation Techniques and Applications · Speech Recognition and Synthesis · Traffic Prediction and Management Techniques

MethodsFocus