Establishing Performance Baselines in Fine-Tuning, Retrieval-Augmented   Generation and Soft-Prompting for Non-Specialist LLM Users

Jennifer Dodgson; Lin Nanzheng; Julian Peh; Akira Rafhael Janson; Pattirane; Alfath Daryl Alhajir; Eko Ridho Dinarto; Joseph Lim; Syed Danyal; Ahmad

arXiv:2311.05903·cs.IR·March 20, 2024·2 cites

Establishing Performance Baselines in Fine-Tuning, Retrieval-Augmented Generation and Soft-Prompting for Non-Specialist LLM Users

Jennifer Dodgson, Lin Nanzheng, Julian Peh, Akira Rafhael Janson, Pattirane, Alfath Daryl Alhajir, Eko Ridho Dinarto, Joseph Lim, Syed Danyal, Ahmad

PDF

Open Access

TL;DR

This study establishes baseline performance metrics for fine-tuning, retrieval-augmented generation, and soft-prompting on GPT 3.5, demonstrating that RAG and soft prompts enhance non-specialist user capabilities.

Contribution

It provides a comparative analysis of unmodified, fine-tuned, RAG, and soft-prompted GPT 3.5 models using accessible, default settings for non-technical users.

Findings

01

RAG outperforms fine-tuning and base GPT 3.5 Turbo.

02

Soft prompts significantly improve model performance.

03

Baseline results facilitate accessible deployment for non-experts.

Abstract

Research into methods for improving the performance of large language models (LLMs) through fine-tuning, retrieval-augmented generation (RAG) and soft-prompting has tended to focus on the use of highly technical or high-cost techniques, making many of the newly discovered approaches comparatively inaccessible to non-technical users. In this paper we tested an unmodified version of GPT 3.5, a fine-tuned version, and the same unmodified model when given access to a vectorised RAG database, both in isolation and in combination with a basic, non-algorithmic soft prompt. In each case we tested the model's ability to answer a set of 100 questions relating primarily to events that occurred after September 2021 (the point at which GPT 3.5's training data set ends). We found that if commercial platforms are used and default settings are applied with no iteration in order to establish a baseline…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Data Quality and Management

MethodsSparse Evolutionary Training · Multi-Head Attention · Attention Is All You Need · Linear Warmup With Linear Decay · Discriminative Fine-Tuning · Linear Layer · WordPiece · Cosine Annealing · Attention Dropout · Adam