Evaluating Explanation Methods for Vision-and-Language Navigation
Guanqi Chen, Lei Yang, Guanhua Chen, Jia Pan

TL;DR
This paper develops a benchmark and evaluation pipeline to assess explanation methods for vision-and-language navigation models, aiming to improve interpretability of AI decision-making in robotic navigation tasks.
Contribution
It introduces a new erasure-based evaluation pipeline and benchmarks explanation methods for VLN models, addressing the gap in interpretability research for navigation tasks.
Findings
Explanation methods vary in faithfulness for VLN models
The proposed evaluation pipeline effectively measures step-wise explanations
Insights into model decision-making processes are revealed
Abstract
The ability to navigate robots with natural language instructions in an unknown environment is a crucial step for achieving embodied artificial intelligence (AI). With the improving performance of deep neural models proposed in the field of vision-and-language navigation (VLN), it is equally interesting to know what information the models utilize for their decision-making in the navigation tasks. To understand the inner workings of deep neural models, various explanation methods have been developed for promoting explainable AI (XAI). But they are mostly applied to deep neural models for image or text classification tasks and little work has been done in explaining deep neural models for VLN tasks. In this paper, we address these problems by building quantitative benchmarks to evaluate explanation methods for VLN models in terms of faithfulness. We propose a new erasure-based evaluation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Multimodal Machine Learning Applications
