Eye-Q: A Multilingual Benchmark for Visual Word Puzzle Solving and Image-to-Phrase Reasoning
Ali Najar, Alireza Mirrokni, Arshia Izadyari, Sadegh Mohammadian, Amir Homayoon Sharifizade, Asal Meskin, Mobin Bagherian, Ehsaneddin Asgari

TL;DR
Eye-Q introduces a challenging multilingual benchmark with visual word puzzles that test complex reasoning, abstraction, and cross-lingual understanding in vision-language models, exposing significant gaps in current capabilities.
Contribution
This paper presents Eye-Q, a novel multilingual benchmark for visual word puzzle solving that emphasizes reasoning and inference beyond surface recognition.
Findings
State-of-the-art models achieve only 60.27% accuracy on Eye-Q.
Models struggle with abstract and cross-lingual puzzles.
Current models lack flexible conceptual reasoning for image-to-phrase tasks.
Abstract
Vision-Language Models (VLMs) have achieved strong performance on standard vision-language benchmarks, yet often rely on surface-level recognition rather than deeper reasoning. We propose visual word puzzles as a challenging alternative, as they require discovering implicit visual cues, generating and revising hypotheses, and mapping perceptual evidence to non-literal concepts in ways that are difficult to solve via literal grounding, OCR-heavy shortcuts, or simple retrieval-style matching. We introduce Eye-Q, a multilingual benchmark designed to assess this form of complex visual understanding. Eye-Q contains 1,343 puzzles in which a model observes a conceptually dense scene with a brief description and must infer a specific target word or phrase. The puzzles are intentionally unstructured and cue-implicit, with distractors and contextual relationships that demand selective attention,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Neurobiology of Language and Bilingualism
