Dyna-bAbI: unlocking bAbI's potential with dynamic synthetic   benchmarking

Ronen Tamari; Kyle Richardson; Aviad Sar-Shalom; Noam Kahlon; Nelson; Liu; Reut Tsarfaty; Dafna Shahaf

arXiv:2112.00086·cs.CL·December 2, 2021

Dyna-bAbI: unlocking bAbI's potential with dynamic synthetic benchmarking

Ronen Tamari, Kyle Richardson, Aviad Sar-Shalom, Noam Kahlon, Nelson, Liu, Reut Tsarfaty, Dafna Shahaf

PDF

Open Access

TL;DR

This paper introduces Dyna-bAbI, a flexible synthetic benchmarking framework for story understanding, revealing limitations of current models in compositional generalization and emphasizing the need for controllable task generation.

Contribution

We developed Dyna-bAbI, enabling systematic control over bAbI tasks, and demonstrated its effectiveness by creating new compositional tasks that expose current models' limitations.

Findings

01

State-of-the-art models fail in compositional generalization tasks

02

Data diversification improves robustness but is insufficient alone

03

Controllable synthetic benchmarks are crucial for advancing NLU

Abstract

While neural language models often perform surprisingly well on natural language understanding (NLU) tasks, their strengths and limitations remain poorly understood. Controlled synthetic tasks are thus an increasingly important resource for diagnosing model behavior. In this work we focus on story understanding, a core competency for NLU systems. However, the main synthetic resource for story understanding, the bAbI benchmark, lacks such a systematic mechanism for controllable task generation. We develop Dyna-bAbI, a dynamic framework providing fine-grained control over task generation in bAbI. We demonstrate our ideas by constructing three new tasks requiring compositional generalization, an important evaluation setting absent from the original benchmark. We tested both special-purpose models developed for bAbI as well as state-of-the-art pre-trained methods, and found that while both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)