Learning to Act with Affordance-Aware Multimodal Neural SLAM
Zhiwei Jia, Kaixiang Lin, Yizhou Zhao, Qiaozi Gao, Govind Thattai,, Gaurav Sukhatme

TL;DR
This paper introduces AMSLAM, a multimodal neural SLAM system that enhances exploration, planning, and grounding in embodied AI tasks, achieving significant improvements on the ALFRED benchmark.
Contribution
It presents the first multimodal neural SLAM that predicts affordance-aware semantic maps and plans over them, improving exploration and long-horizon planning in embodied AI.
Findings
Over 40% improvement on ALFRED benchmark
Achieved 23.48% success rate on unseen test scenes
Enhanced exploration efficiency and vision-language grounding
Abstract
Recent years have witnessed an emerging paradigm shift toward embodied artificial intelligence, in which an agent must learn to solve challenging tasks by interacting with its environment. There are several challenges in solving embodied multimodal tasks, including long-horizon planning, vision-and-language grounding, and efficient exploration. We focus on a critical bottleneck, namely the performance of planning and navigation. To tackle this challenge, we propose a Neural SLAM approach that, for the first time, utilizes several modalities for exploration, predicts an affordance-aware semantic map, and plans over it at the same time. This significantly improves exploration efficiency, leads to robust long-horizon planning, and enables effective vision-and-language grounding. With the proposed Affordance-aware Multimodal Neural SLAM (AMSLAM) approach, we obtain more than 40% improvement…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning
