Blocksworld Revisited: Learning and Reasoning to Generate Event-Sequences from Image Pairs
Tejas Gokhale, Shailaja Sampat, Zhiyuan Fang, Yezhou Yang, Chitta, Baral

TL;DR
This paper introduces the challenging task of Image-based Event-Sequencing (IES), which involves predicting action sequences to transform one scene into another, emphasizing the importance of reasoning and generalization in visual understanding.
Contribution
The paper presents the first IES dataset (BIRD), evaluates existing deep learning methods, and proposes a modular perception and reasoning approach that improves event-sequence prediction and generalization.
Findings
End-to-end deep learning models underperform in event-sequence inference.
A modular perception and reasoning approach improves accuracy.
Extension to natural images demonstrates potential for real-world applications.
Abstract
The process of identifying changes or transformations in a scene along with the ability of reasoning about their causes and effects, is a key aspect of intelligence. In this work we go beyond recent advances in computational perception, and introduce a more challenging task, Image-based Event-Sequencing (IES). In IES, the task is to predict a sequence of actions required to rearrange objects from the configuration in an input source image to the one in the target image. IES also requires systems to possess inductive generalizability. Motivated from evidence in cognitive development, we compile the first IES dataset, the Blocksworld Image Reasoning Dataset (BIRD) which contains images of wooden blocks in different configurations, and the sequence of moves to rearrange one configuration to the other. We first explore the use of existing deep learning architectures and show that these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
