CricBench: A Multilingual Benchmark for Evaluating LLMs in Cricket Analytics

Parth Agarwal; Navya Kommuri; Trizal Garg; Prisha Singhal; Dhruv Shah; Vaibhav Devraj; Yash Sinha; Jagat Sesh Challa; Murari Mandal; Dhruv Kumar

arXiv:2512.21877·cs.CL·April 14, 2026

CricBench: A Multilingual Benchmark for Evaluating LLMs in Cricket Analytics

Parth Agarwal, Navya Kommuri, Trizal Garg, Prisha Singhal, Dhruv Shah, Vaibhav Devraj, Yash Sinha, Jagat Sesh Challa, Murari Mandal, Dhruv Kumar

PDF

TL;DR

CricBench is a novel multilingual benchmark suite designed to evaluate the SQL generation capabilities of large language models in cricket analytics across various formats and languages.

Contribution

It introduces the first Text-to-SQL benchmark for cricket, with a curated dataset in four languages and evaluation of seven diverse models on domain-specific tasks.

Findings

01

No single model dominates across all cricket formats.

02

Models show high syntactic but low semantic accuracy in SQL generation.

03

Significant domain gap exists compared to existing benchmarks.

Abstract

Cricket is the second most popular sport worldwide, with billions of fans seeking advanced statistical insights unavailable through standard web searches. Although LLMs have advanced significantly in Text-to-SQL tasks, their capability to handle domain-specific nuances and multilingual requirements in sports analytics remains under-explored. We present CricBench, a benchmark suite evaluating the intrinsic SQL generation abilities of LLMs on cricket data across four formats: Test, ODI, T20I, and IPL. We curate a Gold-Standard dataset of 2,654 evaluation instances across four languages (English, Hindi, Punjabi, and Telugu). We evaluate seven models, GPT-5 Mini, Claude Sonnet 4, DeepSeek R1 and V3, Qwen 235B, Llama 3.1, and Gemma 2, using schema-only prompting. No single model dominates across all formats: GPT-5 Mini leads on Test cricket (12.4% DMA), Qwen 235B leads on IPL (28.7%) and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.