Procedural Reasoning Networks for Understanding Multimodal Procedures

Mustafa Sercan Amac; Semih Yagcioglu; Aykut Erdem; Erkut Erdem

arXiv:1909.08859·cs.CL·September 20, 2019

Procedural Reasoning Networks for Understanding Multimodal Procedures

Mustafa Sercan Amac, Semih Yagcioglu, Aykut Erdem, Erkut Erdem

PDF

TL;DR

This paper introduces a neural comprehension model that leverages multimodal data and relational memory to improve understanding of procedural instructions, significantly enhancing accuracy in visual reasoning tasks without relying on strong inductive biases.

Contribution

We propose a novel entity-aware neural model with external relational memory that dynamically updates entity states during multimodal procedural comprehension, outperforming previous models.

Findings

01

Improved accuracy on RecipeQA visual reasoning tasks.

02

Effective dynamic entity representations learned without supervision.

03

Model exploits multimodality to enhance semantic understanding.

Abstract

This paper addresses the problem of comprehending procedural commonsense knowledge. This is a challenging task as it requires identifying key entities, keeping track of their state changes, and understanding temporal and causal relations. Contrary to most of the previous work, in this study, we do not rely on strong inductive bias and explore the question of how multimodality can be exploited to provide a complementary semantic signal. Towards this end, we introduce a new entity-aware neural comprehension model augmented with external relational memory units. Our model learns to dynamically update entity states in relation to each other while reading the text instructions. Our experimental analysis on the visual reasoning tasks in the recently proposed RecipeQA dataset reveals that our approach improves the accuracy of the previously reported models by a large margin. Moreover, we find…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.