Learning Action-Effect Dynamics from Pairs of Scene-graphs
Shailaja Keyur Sampat, Pratyay Banerjee, Yezhou Yang, Chitta Baral

TL;DR
This paper introduces a novel method that uses scene-graph representations of images to reason about the effects of actions described in natural language, demonstrating improved performance and generalization on the CLEVR_HYP dataset.
Contribution
The paper presents a new approach leveraging scene-graphs for action-effect reasoning, enhancing data efficiency and generalization over existing models.
Findings
Effective in reasoning about action effects from scene-graphs
Improves performance and generalization on CLEVR_HYP dataset
Requires less data compared to previous models
Abstract
'Actions' play a vital role in how humans interact with the world. Thus, autonomous agents that would assist us in everyday tasks also require the capability to perform 'Reasoning about Actions & Change' (RAC). Recently, there has been growing interest in the study of RAC with visual and linguistic inputs. Graphs are often used to represent semantic structure of the visual content (i.e. objects, their attributes and relationships among objects), commonly referred to as scene-graphs. In this work, we propose a novel method that leverages scene-graph representation of images to reason about the effects of actions described in natural language. We experiment with existing CLEVR_HYP (Sampat et. al, 2021) dataset and show that our proposed approach is effective in terms of performance, data efficiency, and generalization capability compared to existing models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning
