Comparing the Performance of LLMs in RAG-based Question-Answering: A Case Study in Computer Science Literature
Ranul Dayarathne, Uvini Ranaweera, Upeksha Ganegoda

TL;DR
This study compares the effectiveness of open-source and proprietary Large Language Models in RAG-based question-answering within computer science literature, highlighting Mistral-7b-instruct's superior performance among open-source models and GPT-3.5's overall effectiveness.
Contribution
It provides a comparative analysis of multiple LLMs in RAG-based QA tasks in a specific domain, demonstrating the potential of open-source models to match proprietary ones.
Findings
GPT-3.5 with RAG performs best overall.
Mistral-7b-instruct outperforms other open-source LLMs.
Orca-mini-v3-7b has the shortest response latency.
Abstract
Retrieval Augmented Generation (RAG) is emerging as a powerful technique to enhance the capabilities of Generative AI models by reducing hallucination. Thus, the increasing prominence of RAG alongside Large Language Models (LLMs) has sparked interest in comparing the performance of different LLMs in question-answering (QA) in diverse domains. This study compares the performance of four open-source LLMs, Mistral-7b-instruct, LLaMa2-7b-chat, Falcon-7b-instruct and Orca-mini-v3-7b, and OpenAI's trending GPT-3.5 over QA tasks within the computer science literature leveraging RAG support. Evaluation metrics employed in the study include accuracy and precision for binary questions and ranking by a human expert, ranking by Google's AI model Gemini, alongside cosine similarity for long-answer questions. GPT-3.5, when paired with RAG, effectively answers binary and long-answer questions,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Text Readability and Simplification
