Multilingual LLMs Are Not Multilingual Thinkers: Evidence from Hindi Analogy Evaluation
Ashray Gupta, Rohan Joseph, Sunny Rai

TL;DR
This paper introduces a Hindi analogy test set to evaluate multilingual LLMs' reasoning in Hindi, revealing that models perform best with English prompts and highlighting limitations in their multilingual reasoning abilities.
Contribution
The paper presents the first Hindi analogy test set (HATS) and evaluates multilingual LLMs, proposing a grounded Chain of Thought approach to improve Hindi reasoning.
Findings
Models perform best with English prompts.
Grounded Chain of Thought improves Hindi analogy reasoning.
Multilingual LLMs show limited reasoning in Hindi.
Abstract
Analogies test a model's ability to infer implicit relationships between concepts, making them a key benchmark for evaluating reasoning capabilities. While large language models (LLMs) are widely evaluated for reasoning in English, their abilities in Indic languages remain understudied, limiting our understanding of whether these models generalize across languages. To address this gap, we introduce a new Hindi Analogy Test Set (HATS), comprising 405 multiple-choice questions sourced from Indian government exams. We benchmark state-of-the-art multilingual LLMs using various prompting strategies and introduce a grounded Chain of Thought approach that leverages cognitive theories of analogical reasoning. This approach improves model performance on Hindi analogy questions. Our experiments show that models perform best with English prompts, irrespective of the prompting strategy. Our test…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Text Readability and Simplification · Translation Studies and Practices
