Jointly Learning Truth-Conditional Denotations and Groundings using   Parallel Attention

Leon Bergen; Dzmitry Bahdanau; Timothy J. O'Donnell

arXiv:2104.06645·cs.CL·April 15, 2021·1 cites

Jointly Learning Truth-Conditional Denotations and Groundings using Parallel Attention

Leon Bergen, Dzmitry Bahdanau, Timothy J. O'Donnell

PDF

Open Access

TL;DR

This paper introduces a neurosymbolic model that jointly learns word denotations and groundings using parallel attention, achieving state-of-the-art visual question answering performance by grounding objects in images based solely on question signals.

Contribution

It proposes a novel parallel attention mechanism for jointly learning denotations and groundings within a truth-conditional semantic framework.

Findings

01

Achieves state-of-the-art VQA performance on CLEVR.

02

Learns to ground objects using only question-based training signals.

03

Can adapt to non-canonical groundings by modifying training answers.

Abstract

We present a model that jointly learns the denotations of words together with their groundings using a truth-conditional semantics. Our model builds on the neurosymbolic approach of Mao et al. (2019), learning to ground objects in the CLEVR dataset (Johnson et al., 2017) using a novel parallel attention mechanism. The model achieves state of the art performance on visual question answering, learning to detect and ground objects with question performance as the only training signal. We also show that the model is able to learn flexible non-canonical groundings just by adjusting answers to questions in the training set.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling