NSMQ Riddles: A Benchmark of Scientific and Mathematical Riddles for Quizzing Large Language Models

George Boateng; Naafi Ibrahim; Samuel John; Philemon Badu; Patrick Agyeman-Budu; Jonathan Mensah; Kevin Yeboah; William Edor; Andrew Mensa-Onumah; Nana Yeboah; Victor Wumbor-Apin Kumbol

arXiv:2605.07051·cs.CL·May 11, 2026

NSMQ Riddles: A Benchmark of Scientific and Mathematical Riddles for Quizzing Large Language Models

George Boateng, Naafi Ibrahim, Samuel John, Philemon Badu, Patrick Agyeman-Budu, Jonathan Mensah, Kevin Yeboah, William Edor, Andrew Mensa-Onumah, Nana Yeboah, Victor Wumbor-Apin Kumbol

PDF

TL;DR

This paper introduces NSMQ Riddles, a challenging benchmark dataset of Ghanaian science and math riddles for evaluating large language models' reasoning abilities, highlighting their current limitations.

Contribution

It presents a novel, culturally diverse benchmark dataset from Ghana's NSMQ competition to assess LLMs in scientific and mathematical reasoning.

Findings

01

State-of-the-art LLMs perform worse than top student contestants.

02

The dataset is challenging even for advanced models.

03

The benchmark promotes global diversity in AI evaluation.

Abstract

Large Language Models (LLMs) have shown good performance on various science educational benchmarks, demonstrating their potential for use in science and mathematics education. Yet, LLMs tend to be evaluated on science and mathematical educational datasets from the Western world, with an underrepresentation of datasets from the Global South. Furthermore, they tend to have multiple-choice answer options that are trivial to evaluate. In this work, we present NSMQ Riddles, a novel benchmark of Scientific and Mathematical Riddles from Ghana's National Science and Maths Quiz (NSMQ) competition to evaluate LLMs. The NSMQ is an annual live TV competition for senior secondary school students in Ghana that brings together the smartest high school students in Ghana who compete in teams of 2 by answering questions in biology, chemistry, physics, and math over five rounds and five stages until a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.