Multimodal Datasets and Benchmarks for Reasoning about Dynamic Spatio-Temporality in Everyday Environments
Takanori Ugai, Kensho Hara, Shusaku Egami, Ken Fukuda

TL;DR
This paper introduces a new multimodal dataset and benchmark created using a 3D simulator to evaluate AI's ability to understand dynamic spatio-temporal aspects of everyday environments, aiding Embodied AI development.
Contribution
It presents a novel synthetic dataset and benchmark for reasoning about human behavior and environment in home settings, facilitating progress in Embodied AI.
Findings
Preliminary experiments show dataset's effectiveness in measuring AI understanding of daily life.
The dataset enables evaluation of AI reasoning about spatio-temporal dynamics.
The approach supports development of more capable Embodied AI systems.
Abstract
We used a 3D simulator to create artificial video data with standardized annotations, aiming to aid in the development of Embodied AI. Our question answering (QA) dataset measures the extent to which a robot can understand human behavior and the environment in a home setting. Preliminary experiments suggest our dataset is useful in measuring AI's comprehension of daily life. \end{abstract}
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeographic Information Systems Studies · Speech and dialogue systems
