GPT vs RETRO: Exploring the Intersection of Retrieval and   Parameter-Efficient Fine-Tuning

Aleksander Ficek; Jiaqi Zeng; Oleksii Kuchaiev

arXiv:2407.04528·cs.CL·October 28, 2024

GPT vs RETRO: Exploring the Intersection of Retrieval and Parameter-Efficient Fine-Tuning

Aleksander Ficek, Jiaqi Zeng, Oleksii Kuchaiev

PDF

Open Access 1 Video

TL;DR

This study compares retrieval-augmented models and GPT with parameter-efficient fine-tuning methods, revealing RETRO's superior zero-shot performance and GPT's higher potential with PEFT, especially in 8B models.

Contribution

First comprehensive comparison of PEFT methods applied to RAG-enhanced GPT and RETRO models across multiple sizes, highlighting their relative strengths and performance trade-offs.

Findings

01

RETRO outperforms GPT in zero-shot settings due to pre-training.

02

GPT models achieve higher performance with PEFT techniques.

03

8B models offer the best cost-performance balance.

Abstract

Parameter-Efficient Fine-Tuning (PEFT) and Retrieval-Augmented Generation (RAG) have become popular methods for adapting large language models while minimizing compute requirements. In this paper, we apply PEFT methods (P-tuning, Adapters, and LoRA) to a modified Retrieval-Enhanced Transformer (RETRO) and a baseline GPT model across several sizes, ranging from 823 million to 48 billion parameters. We show that RETRO models outperform GPT models in zero-shot settings due to their unique pre-training process but GPT models have higher performance potential with PEFT. Additionally, our study indicates that 8B parameter models strike an optimal balance between cost and performance and P-tuning lags behind other PEFT techniques. We further provide a comparative analysis between applying PEFT to an Instruction-tuned RETRO model and base RETRO model. This work presents the first comprehensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

GPT vs RETRO: Exploring the Intersection of Retrieval and Parameter-Efficient Fine-Tuning· underline

Taxonomy

TopicsMachine Learning and Algorithms · Algorithms and Data Compression

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · WordPiece · Cosine Annealing · Linear Layer · Attention Dropout · Linear Warmup With Linear Decay · BART · Weight Decay · BERT