Efficient In-Domain Question Answering for Resource-Constrained   Environments

Isaac Chung; Phat Vo; Arman C. Kizilkale; Aaron Reite

arXiv:2409.17648·cs.CL·October 18, 2024

Efficient In-Domain Question Answering for Resource-Constrained Environments

Isaac Chung, Phat Vo, Arman C. Kizilkale, Aaron Reite

PDF

Open Access

TL;DR

This paper introduces CRAFT, a resource-efficient retrieval augmented fine tuning method combining RAFT and LoRA, enabling effective question answering in environments with limited computational resources.

Contribution

It proposes a novel combination of RAFT and LoRA to improve efficiency and reduce resource requirements for knowledge-intensive QA tasks.

Findings

01

CRAFT achieves comparable performance to larger models.

02

It significantly reduces fine tuning and storage needs.

03

Faster inference times are demonstrated in resource-constrained settings.

Abstract

Retrieval Augmented Generation (RAG) is a common method for integrating external knowledge into pretrained Large Language Models (LLMs) to enhance accuracy and relevancy in question answering (QA) tasks. However, prompt engineering and resource efficiency remain significant bottlenecks in developing optimal and robust RAG solutions for real-world QA applications. Recent studies have shown success in using fine tuning to address these problems; in particular, Retrieval Augmented Fine Tuning (RAFT) applied to smaller 7B models has demonstrated superior performance compared to RAG setups with much larger models such as GPT-3.5. The combination of RAFT with parameter-efficient fine tuning (PEFT) techniques, such as Low-Rank Adaptation (LoRA), promises an even more efficient solution, yet remains an unexplored area. In this work, we combine RAFT with LoRA to reduce fine tuning and storage…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed and Parallel Computing Systems · Multimodal Machine Learning Applications · AI-based Problem Solving and Planning

Methods15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Cosine Annealing · Linear Warmup With Cosine Annealing · WordPiece · Linear Warmup With Linear Decay · Linear Layer · Weight Decay · Byte Pair Encoding · BERT