NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models

Pranshu Pandya; Vatsal Gupta; Agney S Talwarr; Tushar Kataria; Dan; Roth; Vivek Gupta

arXiv:2407.10380·cs.CV·April 2, 2025

NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models

Pranshu Pandya, Vatsal Gupta, Agney S Talwarr, Tushar Kataria, Dan, Roth, Vivek Gupta

PDF

Open Access 2 Videos

TL;DR

NTSEBench is a new multi-modal reasoning dataset with 2728 questions and 4642 images, designed to evaluate the cognitive reasoning abilities of large vision-language models beyond simple pattern recognition.

Contribution

The paper introduces NTSEBench, a comprehensive dataset for assessing complex cognitive reasoning in vision-language models, along with baseline evaluations and modeling strategies.

Findings

01

State-of-the-art models show limited performance on complex reasoning tasks.

02

The dataset covers diverse question types from the NTSE exam.

03

Proposed strategies enable better multi-modal reasoning handling.

Abstract

Cognitive textual and visual reasoning tasks, including puzzles, series, and analogies, demand the ability to quickly reason, decipher, and evaluate patterns both textually and spatially. Due to extensive training on vast amounts of human-curated data, LLMs and VLMs excel in common-sense reasoning tasks, however still struggle with more complex reasoning that demands deeper cognitive understanding. We introduce NTSEBench, a new dataset designed to evaluate cognitive multi-modal reasoning and problem-solving skills of large models. The dataset contains 2728 multiple-choice questions, accompanied by a total of 4,642 images, categorized into 26 different types. These questions are drawn from the nationwide NTSE examination in India and feature a mix of visual and textual general aptitude challenges, designed to assess intelligence and critical thinking skills beyond mere rote learning. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models· underline

Taxonomy

TopicsMultimodal Machine Learning Applications