Hypergraph Transformer: Weakly-supervised Multi-hop Reasoning for   Knowledge-based Visual Question Answering

Yu-Jung Heo; Eun-Sol Kim; Woo Suk Choi; Byoung-Tak Zhang

arXiv:2204.10448·cs.CV·April 25, 2022·1 cites

Hypergraph Transformer: Weakly-supervised Multi-hop Reasoning for Knowledge-based Visual Question Answering

Yu-Jung Heo, Eun-Sol Kim, Woo Suk Choi, Byoung-Tak Zhang

PDF

Open Access 1 Repo

TL;DR

The paper introduces Hypergraph Transformer, a novel model that uses hypergraphs to improve multi-hop reasoning in knowledge-based visual and textual question answering under weak supervision.

Contribution

It proposes a hypergraph-based approach to encode high-level semantics and high-order associations for better reasoning in knowledge-based QA tasks.

Findings

01

Outperforms existing methods on multiple knowledge-based QA datasets.

02

Effectively captures high-order semantics and multi-hop reasoning.

03

Demonstrates significant improvements especially in complex reasoning scenarios.

Abstract

Knowledge-based visual question answering (QA) aims to answer a question which requires visually-grounded external knowledge beyond image content itself. Answering complex questions that require multi-hop reasoning under weak supervision is considered as a challenging problem since i) no supervision is given to the reasoning process and ii) high-order semantics of multi-hop knowledge facts need to be captured. In this paper, we introduce a concept of hypergraph to encode high-level semantics of a question and a knowledge base, and to learn high-order associations between them. The proposed model, Hypergraph Transformer, constructs a question hypergraph and a query-aware knowledge hypergraph, and infers an answer by encoding inter-associations between two hypergraphs and intra-associations in both hypergraph itself. Extensive experiments on two knowledge-based visual QA and two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yujungheo/kbvqa-public
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Dense Connections · Softmax · Label Smoothing · Dropout · Adam