Neuro-Symbolic Visual Dialog

Adnen Abdessaied; Mihai B\^ace; Andreas Bulling

arXiv:2208.10353·cs.CV·August 23, 2022·1 cites

Neuro-Symbolic Visual Dialog

Adnen Abdessaied, Mihai B\^ace, Andreas Bulling

PDF

Open Access 1 Repo

TL;DR

This paper introduces Neuro-Symbolic Visual Dialog (NSVD), a novel approach combining deep learning and symbolic reasoning to improve multi-round visual dialog tasks, especially in co-reference resolution and answer accuracy.

Contribution

It presents the first neuro-symbolic method for visual dialog, achieving state-of-the-art accuracy with less training data and better robustness and generalization.

Findings

01

Achieves 99.72% accuracy on CLEVR-Dialog, surpassing previous methods.

02

Demonstrates improved robustness to incomplete dialog histories.

03

Generalizes well to longer dialogs and unseen question types.

Abstract

We propose Neuro-Symbolic Visual Dialog (NSVD) -the first method to combine deep learning and symbolic program execution for multi-round visually-grounded reasoning. NSVD significantly outperforms existing purely-connectionist methods on two key challenges inherent to visual dialog: long-distance co-reference resolution as well as vanishing question-answering performance. We demonstrate the latter by proposing a more realistic and stricter evaluation scheme in which we use predicted answers for the full dialog history when calculating accuracy. We describe two variants of our model and show that using this new scheme, our best model achieves an accuracy of 99.72% on CLEVR-Dialog -a relative improvement of more than 10% over the state of the art while only requiring a fraction of training data. Moreover, we demonstrate that our neuro-symbolic models have a higher mean first failure…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

adnenabdessaied/NSVD
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Video Analysis and Summarization