TL;DR
JRDB-Reasoning introduces a new benchmark with adjustable difficulty and detailed reasoning annotations, enabling comprehensive evaluation of visual reasoning in robotics within complex, human-crowded environments.
Contribution
The paper presents a formalization of reasoning complexity, an adaptive question generation engine, and an extended dataset with structured annotations for visual reasoning in robotics.
Findings
Enables fine-grained evaluation of visual reasoning models
Supports dynamic assessment across different reasoning levels
Provides structured, step-by-step reasoning workflows
Abstract
Recent advances in Vision-Language Models (VLMs) and large language models (LLMs) have greatly enhanced visual reasoning, a key capability for embodied AI agents like robots. However, existing visual reasoning benchmarks often suffer from several limitations: they lack a clear definition of reasoning complexity, offer have no control to generate questions over varying difficulty and task customization, and fail to provide structured, step-by-step reasoning annotations (workflows). To bridge these gaps, we formalize reasoning complexity, introduce an adaptive query engine that generates customizable questions of varying complexity with detailed intermediate annotations, and extend the JRDB dataset with human-object interaction and geometric relationship annotations to create JRDB-Reasoning, a benchmark tailored for visual reasoning in human-crowded environments. Our engine and benchmark…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
