Cognitive Visual Commonsense Reasoning Using Dynamic Working Memory

Xuejiao Tang; Xin Huang; Wenbin Zhang; Travers B. Child; Qiong Hu,; Zhen Liu; Ji Zhang

arXiv:2107.01671·cs.CV·December 11, 2023

Cognitive Visual Commonsense Reasoning Using Dynamic Working Memory

Xuejiao Tang, Xin Huang, Wenbin Zhang, Travers B. Child, Qiong Hu,, Zhen Liu, Ji Zhang

PDF

1 Repo

TL;DR

This paper introduces a dynamic working memory model for visual commonsense reasoning, enhancing inference accuracy and interpretability in VCR tasks by storing and utilizing accumulated prior knowledge.

Contribution

The paper presents a novel dynamic working memory approach that improves generalization and interpretability in visual commonsense reasoning tasks.

Findings

01

Significant performance improvements on the VCR benchmark dataset.

02

Enhanced interpretability of reasoning process.

03

Effective utilization of prior knowledge in inference.

Abstract

Visual Commonsense Reasoning (VCR) predicts an answer with corresponding rationale, given a question-image input. VCR is a recently introduced visual scene understanding task with a wide range of applications, including visual question answering, automated vehicle systems, and clinical decision support. Previous approaches to solving the VCR task generally rely on pre-training or exploiting memory with long dependency relationship encoded models. However, these approaches suffer from a lack of generalizability and prior knowledge. In this paper we propose a dynamic working memory based cognitive VCR network, which stores accumulated commonsense between sentences to provide prior knowledge for inference. Extensive experiments show that the proposed model yields significant improvements over existing methods on the benchmark VCR dataset. Moreover, the proposed model provides intuitive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tanjatang/DMVCR
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.