QKVQA: Question-Focused Filtering for Knowledge-based VQA

Wei Ye; Yixin Su; Yueguo Chen; Longxiang Gao; Jianjun Li; Ruixuan Li; and Rui Zhang

arXiv:2601.13856·cs.IR·April 8, 2026

QKVQA: Question-Focused Filtering for Knowledge-based VQA

Wei Ye, Yixin Su, Yueguo Chen, Longxiang Gao, Jianjun Li, Ruixuan Li, and Rui Zhang

PDF

1 Repo

TL;DR

This paper introduces a question-focused filtering approach for knowledge-based VQA, improving accuracy by efficiently selecting relevant external knowledge using trainable modules.

Contribution

It proposes a novel trainable question-focused filter and cross-article selection method that enhances knowledge filtering in KB-VQA tasks.

Findings

01

Outperforms state-of-the-art by 3.2% on Encyclopedic-VQA

02

Outperforms state-of-the-art by 2.2% on InfoSeek

03

Maintains inference efficiency with shorter context length

Abstract

Visual Question Answering (VQA) is the task of answering questions based on image content. Building upon this, Knowledge-Based VQA (KB-VQA) requires models to answer questions that depend on external knowledge beyond the visual content of an image. In such settings, effective knowledge filtering is essential for achieving high question answering accuracy. Typical filtering methods suffer from two issues: they fail to focus on parts relevant to the question during candidate section encoding, and they use similarity metrics to locate a section from a single article, resulting in information limitation. To address these issues, this paper proposes a question-focused, cross-article filtering method. Specifically, we design a trainable Question-Focused Filter (QFF) and a Chunk-based Dynamic Cross-Article Selection module (CDA). This approach maintains inference time comparable to the optimal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

leaffeall/QKVQA
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.