The Riddle of Reflection: Evaluating Reasoning and Self-Awareness in Multilingual LLMs using Indian Riddles

Abhinav P M; Ojasva Saxena; Oswald C; Parameswari Krishnamurthy

arXiv:2511.00960·cs.CL·November 5, 2025

The Riddle of Reflection: Evaluating Reasoning and Self-Awareness in Multilingual LLMs using Indian Riddles

Abhinav P M, Ojasva Saxena, Oswald C, Parameswari Krishnamurthy

PDF

Open Access

TL;DR

This study evaluates multilingual reasoning and self-awareness in large language models across seven Indian languages using a new riddle dataset, revealing significant gaps in reasoning accuracy and self-assessment capabilities.

Contribution

Introduces a multilingual Indian riddle dataset and systematically assesses reasoning and self-awareness of five LLMs across multiple languages and prompting strategies.

Findings

01

Gemini 2.5 Pro performs best overall in riddle-solving.

02

Self-evaluation accuracy is inversely related to initial reasoning accuracy.

03

LLaMA 4 Scout shows higher self-awareness than top-performing models.

Abstract

The extent to which large language models (LLMs) can perform culturally grounded reasoning across non-English languages remains underexplored. This paper examines the reasoning and self-assessment abilities of LLMs across seven major Indian languages-Bengali, Gujarati, Hindi, Kannada, Malayalam, Tamil, and Telugu. We introduce a multilingual riddle dataset combining traditional riddles with context-reconstructed variants and evaluate five LLMs-Gemini 2.5 Pro, Gemini 2.5 Flash, Mistral-Saba, LLaMA 4 Scout, and LLaMA 4 Maverick-under seven prompting strategies. In the first stage, we assess riddle-solving performance and find that while Gemini 2.5 Pro performs best overall, few-shot methods yield only marginal gains, and accuracy varies notably across languages. In the second stage, we conduct a self-evaluation experiment to measure reasoning consistency. The results reveal a key finding:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)