Hypergraph Transformer: Weakly-supervised Multi-hop Reasoning for Knowledge-based Visual Question Answering
Yu-Jung Heo, Eun-Sol Kim, Woo Suk Choi, Byoung-Tak Zhang

TL;DR
The paper introduces Hypergraph Transformer, a novel model that uses hypergraphs to improve multi-hop reasoning in knowledge-based visual and textual question answering under weak supervision.
Contribution
It proposes a hypergraph-based approach to encode high-level semantics and high-order associations for better reasoning in knowledge-based QA tasks.
Findings
Outperforms existing methods on multiple knowledge-based QA datasets.
Effectively captures high-order semantics and multi-hop reasoning.
Demonstrates significant improvements especially in complex reasoning scenarios.
Abstract
Knowledge-based visual question answering (QA) aims to answer a question which requires visually-grounded external knowledge beyond image content itself. Answering complex questions that require multi-hop reasoning under weak supervision is considered as a challenging problem since i) no supervision is given to the reasoning process and ii) high-order semantics of multi-hop knowledge facts need to be captured. In this paper, we introduce a concept of hypergraph to encode high-level semantics of a question and a knowledge base, and to learn high-order associations between them. The proposed model, Hypergraph Transformer, constructs a question hypergraph and a query-aware knowledge hypergraph, and infers an answer by encoding inter-associations between two hypergraphs and intra-associations in both hypergraph itself. Extensive experiments on two knowledge-based visual QA and two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Dense Connections · Softmax · Label Smoothing · Dropout · Adam
