RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on   Agriculture

Angels Balaguer; Vinamra Benara; Renato Luiz de Freitas Cunha; Roberto; de M. Estev\~ao Filho; Todd Hendry; Daniel Holstein; Jennifer Marsman; Nick; Mecklenburg; Sara Malvar; Leonardo O. Nunes; Rafael Padilha; Morris Sharp,; Bruno Silva; Swati Sharma; Vijay Aski; Ranveer Chandra

arXiv:2401.08406·cs.CL·January 31, 2024·50 cites

RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture

Angels Balaguer, Vinamra Benara, Renato Luiz de Freitas Cunha, Roberto, de M. Estev\~ao Filho, Todd Hendry, Daniel Holstein, Jennifer Marsman, Nick, Mecklenburg, Sara Malvar, Leonardo O. Nunes, Rafael Padilha, Morris Sharp,, Bruno Silva, Swati Sharma, Vijay Aski, Ranveer Chandra

PDF

Open Access

TL;DR

This paper compares Retrieval-Augmented Generation and fine-tuning for LLMs, presenting a pipeline, tradeoffs, and a case study in agriculture demonstrating improved accuracy and geographic knowledge integration.

Contribution

It introduces a comprehensive pipeline for RAG and fine-tuning, evaluates their tradeoffs across multiple LLMs, and applies this to agriculture for location-specific insights.

Findings

01

Fine-tuning improves accuracy by over 6 percentage points.

02

Combining RAG with fine-tuning yields an additional 5 percentage point accuracy gain.

03

Fine-tuned models better leverage cross-geography information, increasing answer similarity from 47% to 72%.

Abstract

There are two common ways in which developers are incorporating proprietary and domain-specific data when building applications of Large Language Models (LLMs): Retrieval-Augmented Generation (RAG) and Fine-Tuning. RAG augments the prompt with the external data, while fine-Tuning incorporates the additional knowledge into the model itself. However, the pros and cons of both approaches are not well understood. In this paper, we propose a pipeline for fine-tuning and RAG, and present the tradeoffs of both for multiple popular LLMs, including Llama2-13B, GPT-3.5, and GPT-4. Our pipeline consists of multiple stages, including extracting information from PDFs, generating questions and answers, using them for fine-tuning, and leveraging GPT-4 for evaluating the results. We propose metrics to assess the performance of different stages of the RAG and fine-Tuning pipeline. We conduct an in-depth…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · Multi-Head Attention · Attention Is All You Need · Label Smoothing · Absolute Position Encodings · Linear Layer · WordPiece · Linear Warmup With Linear Decay · Cosine Annealing