Adversarial Databases Improve Success in Retrieval-based Large Language Models
Sean Wu, Michael Koo, Li Yo Kao, Andy Black, Lesley Blum, Fabien, Scalzo, Ira Kurtz

TL;DR
This study reveals that using adversarial background information in Retrieval-Augmented Generation can unexpectedly enhance the performance of open-source large language models in answering medical multiple-choice questions.
Contribution
It demonstrates for the first time that adversarial datasets can improve RAG-based LLM success, challenging previous assumptions about their negative impact.
Findings
Adversarial Bible text improved LLM performance in MCQ tasks.
Random word datasets also enhanced some models' success.
Most models benefited from relevant background databases.
Abstract
Open-source LLMs have shown great potential as fine-tuned chatbots, and demonstrate robust abilities in reasoning and surpass many existing benchmarks. Retrieval-Augmented Generation (RAG) is a technique for improving the performance of LLMs on tasks that the models weren't explicitly trained on, by leveraging external knowledge databases. Numerous studies have demonstrated the effectiveness of RAG to more successfully accomplish downstream tasks when using vector datasets that consist of relevant background information. It has been implicitly assumed by those in the field that if adversarial background information is utilized in this context, that the success of using a RAG-based approach would be nonexistent or even negatively impact the results. To address this assumption, we tested several open-source LLMs on the ability of RAG to improve their success in answering multiple-choice…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Sparse Evolutionary Training · Linear Layer · Linear Warmup With Linear Decay · Multi-Head Attention · Weight Decay · Residual Connection · Dropout · WordPiece
