Dyna-bAbI: unlocking bAbI's potential with dynamic synthetic benchmarking
Ronen Tamari, Kyle Richardson, Aviad Sar-Shalom, Noam Kahlon, Nelson, Liu, Reut Tsarfaty, Dafna Shahaf

TL;DR
This paper introduces Dyna-bAbI, a flexible synthetic benchmarking framework for story understanding, revealing limitations of current models in compositional generalization and emphasizing the need for controllable task generation.
Contribution
We developed Dyna-bAbI, enabling systematic control over bAbI tasks, and demonstrated its effectiveness by creating new compositional tasks that expose current models' limitations.
Findings
State-of-the-art models fail in compositional generalization tasks
Data diversification improves robustness but is insufficient alone
Controllable synthetic benchmarks are crucial for advancing NLU
Abstract
While neural language models often perform surprisingly well on natural language understanding (NLU) tasks, their strengths and limitations remain poorly understood. Controlled synthetic tasks are thus an increasingly important resource for diagnosing model behavior. In this work we focus on story understanding, a core competency for NLU systems. However, the main synthetic resource for story understanding, the bAbI benchmark, lacks such a systematic mechanism for controllable task generation. We develop Dyna-bAbI, a dynamic framework providing fine-grained control over task generation in bAbI. We demonstrate our ideas by constructing three new tasks requiring compositional generalization, an important evaluation setting absent from the original benchmark. We tested both special-purpose models developed for bAbI as well as state-of-the-art pre-trained methods, and found that while both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)
