The Battle of LLMs: A Comparative Study in Conversational QA Tasks
Aryan Rangapur, Aman Rangapur

TL;DR
This paper compares the performance of leading large language models like ChatGPT, GPT-4, Gemini, Mixtral, and Claude on conversational question-answering tasks, highlighting their strengths and weaknesses through detailed evaluation.
Contribution
It provides a comprehensive comparative analysis of recent advanced language models in conversational QA, revealing their relative strengths, weaknesses, and areas for improvement.
Findings
Models show varying accuracy across different datasets.
Some models are more prone to generating incorrect answers.
Performance differences highlight potential for targeted enhancements.
Abstract
Large language models have gained considerable interest for their impressive performance on various tasks. Within this domain, ChatGPT and GPT-4, developed by OpenAI, and the Gemini, developed by Google, have emerged as particularly popular among early adopters. Additionally, Mixtral by Mistral AI and Claude by Anthropic are newly released, further expanding the landscape of advanced language models. These models are viewed as disruptive technologies with applications spanning customer service, education, healthcare, and finance. More recently, Mistral has entered the scene, captivating users with its unique ability to generate creative content. Understanding the perspectives of these users is crucial, as they can offer valuable insights into the potential strengths, weaknesses, and overall success or failure of these technologies in various domains. This research delves into the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law · Natural Language Processing Techniques
Methodstravel james · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Multi-Head Attention · Dropout
