JRDB-Reasoning: A Difficulty-Graded Benchmark for Visual Reasoning in Robotics

Simindokht Jahangard; Mehrzad Mohammadi; Yi Shen; Zhixi Cai; Hamid Rezatofighi

arXiv:2508.10287·cs.CV·August 21, 2025

JRDB-Reasoning: A Difficulty-Graded Benchmark for Visual Reasoning in Robotics

Simindokht Jahangard, Mehrzad Mohammadi, Yi Shen, Zhixi Cai, Hamid Rezatofighi

PDF

1 Video

TL;DR

JRDB-Reasoning introduces a new benchmark with adjustable difficulty and detailed reasoning annotations, enabling comprehensive evaluation of visual reasoning in robotics within complex, human-crowded environments.

Contribution

The paper presents a formalization of reasoning complexity, an adaptive question generation engine, and an extended dataset with structured annotations for visual reasoning in robotics.

Findings

01

Enables fine-grained evaluation of visual reasoning models

02

Supports dynamic assessment across different reasoning levels

03

Provides structured, step-by-step reasoning workflows

Abstract

Recent advances in Vision-Language Models (VLMs) and large language models (LLMs) have greatly enhanced visual reasoning, a key capability for embodied AI agents like robots. However, existing visual reasoning benchmarks often suffer from several limitations: they lack a clear definition of reasoning complexity, offer have no control to generate questions over varying difficulty and task customization, and fail to provide structured, step-by-step reasoning annotations (workflows). To bridge these gaps, we formalize reasoning complexity, introduce an adaptive query engine that generates customizable questions of varying complexity with detailed intermediate annotations, and extend the JRDB dataset with human-object interaction and geometric relationship annotations to create JRDB-Reasoning, a benchmark tailored for visual reasoning in human-crowded environments. Our engine and benchmark…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

JRDB-Reasoning: A Difficulty-Graded Benchmark for Visual Reasoning in Robotics· underline