Adversarial Databases Improve Success in Retrieval-based Large Language   Models

Sean Wu; Michael Koo; Li Yo Kao; Andy Black; Lesley Blum; Fabien; Scalzo; Ira Kurtz

arXiv:2407.14609·cs.CL·July 23, 2024

Adversarial Databases Improve Success in Retrieval-based Large Language Models

Sean Wu, Michael Koo, Li Yo Kao, Andy Black, Lesley Blum, Fabien, Scalzo, Ira Kurtz

PDF

Open Access

TL;DR

This study reveals that using adversarial background information in Retrieval-Augmented Generation can unexpectedly enhance the performance of open-source large language models in answering medical multiple-choice questions.

Contribution

It demonstrates for the first time that adversarial datasets can improve RAG-based LLM success, challenging previous assumptions about their negative impact.

Findings

01

Adversarial Bible text improved LLM performance in MCQ tasks.

02

Random word datasets also enhanced some models' success.

03

Most models benefited from relevant background databases.

Abstract

Open-source LLMs have shown great potential as fine-tuned chatbots, and demonstrate robust abilities in reasoning and surpass many existing benchmarks. Retrieval-Augmented Generation (RAG) is a technique for improving the performance of LLMs on tasks that the models weren't explicitly trained on, by leveraging external knowledge databases. Numerous studies have demonstrated the effectiveness of RAG to more successfully accomplish downstream tasks when using vector datasets that consist of relevant background information. It has been implicitly assumed by those in the field that if adversarial background information is utilized in this context, that the success of using a RAG-based approach would be nonexistent or even negatively impact the results. To address this assumption, we tested several open-source LLMs on the ability of RAG to improve their success in answering multiple-choice…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Sparse Evolutionary Training · Linear Layer · Linear Warmup With Linear Decay · Multi-Head Attention · Weight Decay · Residual Connection · Dropout · WordPiece