Learning Action-Effect Dynamics for Hypothetical Vision-Language   Reasoning Task

Shailaja Keyur Sampat; Pratyay Banerjee; Yezhou Yang; Chitta Baral

arXiv:2212.03866·cs.CV·December 9, 2022

Learning Action-Effect Dynamics for Hypothetical Vision-Language Reasoning Task

Shailaja Keyur Sampat, Pratyay Banerjee, Yezhou Yang, Chitta Baral

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new learning strategy for modeling action effects in vision-language reasoning tasks, enhancing understanding of hypothetical scenarios involving actions and changes.

Contribution

It proposes an encoder-decoder architecture to learn action representations and integrates it with existing models for improved reasoning on the CLEVR_HYP dataset.

Findings

01

Improved reasoning accuracy over baselines

02

Enhanced data efficiency in learning

03

Better generalization to unseen scenarios

Abstract

'Actions' play a vital role in how humans interact with the world. Thus, autonomous agents that would assist us in everyday tasks also require the capability to perform 'Reasoning about Actions & Change' (RAC). This has been an important research direction in Artificial Intelligence (AI) in general, but the study of RAC with visual and linguistic inputs is relatively recent. The CLEVR_HYP (Sampat et. al., 2021) is one such testbed for hypothetical vision-language reasoning with actions as the key focus. In this work, we propose a novel learning strategy that can improve reasoning about the effects of actions. We implement an encoder-decoder architecture to learn the representation of actions as vectors. We combine the aforementioned encoder-decoder architecture with existing modality parsers and a scene graph question answering model to evaluate our proposed system on the CLEVR_HYP…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shailaja183/arl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques