NSMQ Riddles: A Benchmark of Scientific and Mathematical Riddles for Quizzing Large Language Models
George Boateng, Naafi Ibrahim, Samuel John, Philemon Badu, Patrick Agyeman-Budu, Jonathan Mensah, Kevin Yeboah, William Edor, Andrew Mensa-Onumah, Nana Yeboah, Victor Wumbor-Apin Kumbol

TL;DR
This paper introduces NSMQ Riddles, a challenging benchmark dataset of Ghanaian science and math riddles for evaluating large language models' reasoning abilities, highlighting their current limitations.
Contribution
It presents a novel, culturally diverse benchmark dataset from Ghana's NSMQ competition to assess LLMs in scientific and mathematical reasoning.
Findings
State-of-the-art LLMs perform worse than top student contestants.
The dataset is challenging even for advanced models.
The benchmark promotes global diversity in AI evaluation.
Abstract
Large Language Models (LLMs) have shown good performance on various science educational benchmarks, demonstrating their potential for use in science and mathematics education. Yet, LLMs tend to be evaluated on science and mathematical educational datasets from the Western world, with an underrepresentation of datasets from the Global South. Furthermore, they tend to have multiple-choice answer options that are trivial to evaluate. In this work, we present NSMQ Riddles, a novel benchmark of Scientific and Mathematical Riddles from Ghana's National Science and Maths Quiz (NSMQ) competition to evaluate LLMs. The NSMQ is an annual live TV competition for senior secondary school students in Ghana that brings together the smartest high school students in Ghana who compete in teams of 2 by answering questions in biology, chemistry, physics, and math over five rounds and five stages until a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
