Jump to better conclusions: SCAN both left and right

Jasmijn Bastings; Marco Baroni; Jason Weston; Kyunghyun Cho; Douwe; Kiela

arXiv:1809.04640·cs.CL·June 22, 2020

Jump to better conclusions: SCAN both left and right

Jasmijn Bastings, Marco Baroni, Jason Weston, Kyunghyun Cho, Douwe, Kiela

PDF

1 Repo

TL;DR

This paper critically examines the SCAN dataset's ability to test systematic generalization in sequence models, introduces a new dataset NACS to better evaluate this, and highlights discrepancies in model performance across these datasets.

Contribution

The paper reveals limitations of SCAN in measuring true systematic generalization and proposes NACS as a more realistic complementary dataset for evaluation.

Findings

01

Models that perform well on SCAN often fail on NACS.

02

NACS better reflects real-world sequence-to-sequence tasks.

03

SCAN may not effectively measure systematic generalization.

Abstract

Lake and Baroni (2018) recently introduced the SCAN data set, which consists of simple commands paired with action sequences and is intended to test the strong generalization abilities of recurrent sequence-to-sequence models. Their initial experiments suggested that such models may fail because they lack the ability to extract systematic rules. Here, we take a closer look at SCAN and show that it does not always capture the kind of generalization that it was designed for. To mitigate this we propose a complementary dataset, which requires mapping actions back to the original commands, called NACS. We show that models that do well on SCAN do not necessarily do well on NACS, and that NACS exhibits properties more closely aligned with realistic use-cases for sequence-to-sequence models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

facebookresearch/NACS
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.