General Evaluation for Instruction Conditioned Navigation using Dynamic   Time Warping

Gabriel Ilharco; Vihan Jain; Alexander Ku; Eugene Ie; Jason Baldridge

arXiv:1907.05446·cs.RO·December 2, 2019·29 cites

General Evaluation for Instruction Conditioned Navigation using Dynamic Time Warping

Gabriel Ilharco, Vihan Jain, Alexander Ku, Eugene Ie, Jason Baldridge

PDF

Open Access 1 Repo

TL;DR

This paper introduces normalized Dynamic Time Warping (nDTW) and Success-constrained DTW (SDTW) as improved evaluation metrics for instruction-conditioned navigation, correlating better with human judgments and enhancing RL agent performance.

Contribution

It proposes new DTW-based metrics for navigation evaluation that address flaws in existing metrics and demonstrates their effectiveness through experiments.

Findings

01

nDTW correlates better with human judgments than existing metrics.

02

Using nDTW as a reward improves RL navigation performance.

03

SDTW outperforms previous success-constrained metrics on R4R dataset.

Abstract

In instruction conditioned navigation, agents interpret natural language and their surroundings to navigate through an environment. Datasets for studying this task typically contain pairs of these instructions and reference trajectories. Yet, most evaluation metrics used thus far fail to properly account for the latter, relying instead on insufficient similarity comparisons. We address fundamental flaws in previously used metrics and show how Dynamic Time Warping (DTW), a long known method of measuring similarity between two time series, can be used for evaluation of navigation agents. For such, we define the normalized Dynamic Time Warping (nDTW) metric, that softly penalizes deviations from the reference path, is naturally sensitive to the order of the nodes composing each path, is suited for both continuous and graph-based evaluations, and can be efficiently calculated. Further, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

google-research-datasets/RxR
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Time Series Analysis and Forecasting · Topic Modeling