TL;DR
This paper introduces a question-focused filtering approach for knowledge-based VQA, improving accuracy by efficiently selecting relevant external knowledge using trainable modules.
Contribution
It proposes a novel trainable question-focused filter and cross-article selection method that enhances knowledge filtering in KB-VQA tasks.
Findings
Outperforms state-of-the-art by 3.2% on Encyclopedic-VQA
Outperforms state-of-the-art by 2.2% on InfoSeek
Maintains inference efficiency with shorter context length
Abstract
Visual Question Answering (VQA) is the task of answering questions based on image content. Building upon this, Knowledge-Based VQA (KB-VQA) requires models to answer questions that depend on external knowledge beyond the visual content of an image. In such settings, effective knowledge filtering is essential for achieving high question answering accuracy. Typical filtering methods suffer from two issues: they fail to focus on parts relevant to the question during candidate section encoding, and they use similarity metrics to locate a section from a single article, resulting in information limitation. To address these issues, this paper proposes a question-focused, cross-article filtering method. Specifically, we design a trainable Question-Focused Filter (QFF) and a Chunk-based Dynamic Cross-Article Selection module (CDA). This approach maintains inference time comparable to the optimal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
