Comparing the Performance of LLMs in RAG-based Question-Answering: A Case Study in Computer Science Literature

Ranul Dayarathne; Uvini Ranaweera; Upeksha Ganegoda

arXiv:2511.03261·cs.CL·November 6, 2025

Comparing the Performance of LLMs in RAG-based Question-Answering: A Case Study in Computer Science Literature

Ranul Dayarathne, Uvini Ranaweera, Upeksha Ganegoda

PDF

Open Access

TL;DR

This study compares the effectiveness of open-source and proprietary Large Language Models in RAG-based question-answering within computer science literature, highlighting Mistral-7b-instruct's superior performance among open-source models and GPT-3.5's overall effectiveness.

Contribution

It provides a comparative analysis of multiple LLMs in RAG-based QA tasks in a specific domain, demonstrating the potential of open-source models to match proprietary ones.

Findings

01

GPT-3.5 with RAG performs best overall.

02

Mistral-7b-instruct outperforms other open-source LLMs.

03

Orca-mini-v3-7b has the shortest response latency.

Abstract

Retrieval Augmented Generation (RAG) is emerging as a powerful technique to enhance the capabilities of Generative AI models by reducing hallucination. Thus, the increasing prominence of RAG alongside Large Language Models (LLMs) has sparked interest in comparing the performance of different LLMs in question-answering (QA) in diverse domains. This study compares the performance of four open-source LLMs, Mistral-7b-instruct, LLaMa2-7b-chat, Falcon-7b-instruct and Orca-mini-v3-7b, and OpenAI's trending GPT-3.5 over QA tasks within the computer science literature leveraging RAG support. Evaluation metrics employed in the study include accuracy and precision for binary questions and ranking by a human expert, ranking by Google's AI model Gemini, alongside cosine similarity for long-answer questions. GPT-3.5, when paired with RAG, effectively answers binary and long-answer questions,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Text Readability and Simplification