QUENCH: Measuring the gap between Indic and Non-Indic Contextual General   Reasoning in LLMs

Mohammad Aflah Khan; Neemesh Yadav; Sarah Masud; Md. Shad Akhtar

arXiv:2412.11763·cs.CL·December 17, 2024

QUENCH: Measuring the gap between Indic and Non-Indic Contextual General Reasoning in LLMs

Mohammad Aflah Khan, Neemesh Yadav, Sarah Masud, Md. Shad Akhtar

PDF

Open Access 1 Repo

TL;DR

QUENCH is a new benchmark designed to evaluate large language models' reasoning and world knowledge by using geographically contextualized quiz questions from YouTube, highlighting their strengths and weaknesses.

Contribution

This paper introduces QUENCH, a novel, manually curated benchmark for assessing LLMs' reasoning and world knowledge in a geographically contextualized, zero-shot quiz setting.

Findings

01

LLMs' performance varies with model size and prompting style.

02

Geographical context influences LLM reasoning capabilities.

03

Error analysis reveals common reasoning pitfalls.

Abstract

The rise of large language models (LLMs) has created a need for advanced benchmarking systems beyond traditional setups. To this end, we introduce QUENCH, a novel text-based English Quizzing Benchmark manually curated and transcribed from YouTube quiz videos. QUENCH possesses masked entities and rationales for the LLMs to predict via generation. At the intersection of geographical context and common sense reasoning, QUENCH helps assess world knowledge and deduction capabilities of LLMs via a zero-shot, open-domain quizzing setup. We perform an extensive evaluation on 7 LLMs and 4 metrics, investigating the influence of model size, prompting style, geographical context, and gold-labeled rationale generation. The benchmarking concludes with an error analysis to which the LLMs are prone.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aflah02/quench
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law · Multi-Agent Systems and Negotiation