Fine-Tuning or Fine-Failing? Debunking Performance Myths in Large   Language Models

Scott Barnett; Zac Brannelly; Stefanus Kurniawan; Sheng Wong

arXiv:2406.11201·cs.CL·July 2, 2024·6 cites

Fine-Tuning or Fine-Failing? Debunking Performance Myths in Large Language Models

Scott Barnett, Zac Brannelly, Stefanus Kurniawan, Sheng Wong

PDF

Open Access

TL;DR

This paper investigates the effects of fine-tuning large language models within Retrieval-Augmented Generation systems, revealing that fine-tuning can sometimes decrease performance contrary to common expectations.

Contribution

It provides empirical evidence that fine-tuning LLMs in RAG pipelines may impair their ability to extract and utilize contextual information, challenging assumptions about fine-tuning benefits.

Findings

01

Fine-tuning reduced accuracy in RAG tasks across multiple domains.

02

Contrary to standalone LLMs, fine-tuning can degrade performance in retrieval-based systems.

03

Highlights the importance of validating fine-tuning effects for domain-specific applications.

Abstract

Large Language Models (LLMs) have the unique capability to understand and generate human-like text from input queries. When fine-tuned, these models show enhanced performance on domain-specific queries. OpenAI highlights the process of fine-tuning, stating: "To fine-tune a model, you are required to provide at least 10 examples. We typically see clear improvements from fine-tuning on 50 to 100 training examples, but the right number varies greatly based on the exact use case." This study extends this concept to the integration of LLMs within Retrieval-Augmented Generation (RAG) pipelines, which aim to improve accuracy and relevance by leveraging external corpus data for information retrieval. However, RAG's promise of delivering optimal responses often falls short in complex query scenarios. This study aims to specifically examine the effects of fine-tuning LLMs on their ability to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · WordPiece · Residual Connection · Softmax · Layer Normalization · Byte Pair Encoding · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay