Straight to the Facts: Learning Knowledge Base Retrieval for Factual Visual Question Answering
Medhini Narasimhan, Alexander G. Schwing

TL;DR
This paper introduces a learning-based method for factual visual question answering that effectively retrieves knowledge from a database, outperforming previous approaches by over 5% on a challenging dataset.
Contribution
It proposes a novel learned embedding approach for knowledge base retrieval in visual question answering, improving accuracy over keyword matching techniques.
Findings
Achieved state-of-the-art results on the fact-based visual question answering dataset.
Outperformed competing methods by more than 5%.
Demonstrated robustness to misconceptions caused by synonyms and homographs.
Abstract
Question answering is an important task for autonomous agents and virtual assistants alike and was shown to support the disabled in efficiently navigating an overwhelming environment. Many existing methods focus on observation-based questions, ignoring our ability to seamlessly combine observed content with general knowledge. To understand interactions with a knowledge base, a dataset has been introduced recently and keyword matching techniques were shown to yield compelling results despite being vulnerable to misconceptions due to synonyms and homographs. To address this issue, we develop a learning-based approach which goes straight to the facts via a learned embedding space. We demonstrate state-of-the-art results on the challenging recently introduced fact-based visual question answering dataset, outperforming competing methods by more than 5%.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Text and Document Classification Technologies
