Multimodal Datasets and Benchmarks for Reasoning about Dynamic   Spatio-Temporality in Everyday Environments

Takanori Ugai; Kensho Hara; Shusaku Egami; Ken Fukuda

arXiv:2408.11347·cs.AI·September 18, 2024

Multimodal Datasets and Benchmarks for Reasoning about Dynamic Spatio-Temporality in Everyday Environments

Takanori Ugai, Kensho Hara, Shusaku Egami, Ken Fukuda

PDF

Open Access

TL;DR

This paper introduces a new multimodal dataset and benchmark created using a 3D simulator to evaluate AI's ability to understand dynamic spatio-temporal aspects of everyday environments, aiding Embodied AI development.

Contribution

It presents a novel synthetic dataset and benchmark for reasoning about human behavior and environment in home settings, facilitating progress in Embodied AI.

Findings

01

Preliminary experiments show dataset's effectiveness in measuring AI understanding of daily life.

02

The dataset enables evaluation of AI reasoning about spatio-temporal dynamics.

03

The approach supports development of more capable Embodied AI systems.

Abstract

We used a 3D simulator to create artificial video data with standardized annotations, aiming to aid in the development of Embodied AI. Our question answering (QA) dataset measures the extent to which a robot can understand human behavior and the environment in a home setting. Preliminary experiments suggest our dataset is useful in measuring AI's comprehension of daily life. \end{abstract}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGeographic Information Systems Studies · Speech and dialogue systems