The Battle of LLMs: A Comparative Study in Conversational QA Tasks

Aryan Rangapur; Aman Rangapur

arXiv:2405.18344·cs.CL·May 29, 2024·5 cites

The Battle of LLMs: A Comparative Study in Conversational QA Tasks

Aryan Rangapur, Aman Rangapur

PDF

Open Access

TL;DR

This paper compares the performance of leading large language models like ChatGPT, GPT-4, Gemini, Mixtral, and Claude on conversational question-answering tasks, highlighting their strengths and weaknesses through detailed evaluation.

Contribution

It provides a comprehensive comparative analysis of recent advanced language models in conversational QA, revealing their relative strengths, weaknesses, and areas for improvement.

Findings

01

Models show varying accuracy across different datasets.

02

Some models are more prone to generating incorrect answers.

03

Performance differences highlight potential for targeted enhancements.

Abstract

Large language models have gained considerable interest for their impressive performance on various tasks. Within this domain, ChatGPT and GPT-4, developed by OpenAI, and the Gemini, developed by Google, have emerged as particularly popular among early adopters. Additionally, Mixtral by Mistral AI and Claude by Anthropic are newly released, further expanding the landscape of advanced language models. These models are viewed as disruptive technologies with applications spanning customer service, education, healthcare, and finance. More recently, Mistral has entered the scene, captivating users with its unique ability to generate creative content. Understanding the perspectives of these users is crucial, as they can offer valuable insights into the potential strengths, weaknesses, and overall success or failure of these technologies in various domains. This research delves into the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law · Natural Language Processing Techniques

Methodstravel james · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Multi-Head Attention · Dropout