Enhancing BERT-Based Visual Question Answering through Keyword-Driven Sentence Selection
Davide Napolitano, Lorenzo Vaiani, Luca Cagliero

TL;DR
This paper introduces a keyword-driven sentence selection method to enhance BERT-based visual question answering on multi-page documents, achieving improved performance by focusing on relevant sentences with specific keywords.
Contribution
The paper presents a novel text-only approach that fine-tunes BERT using masked language modeling and keyword-based sampling to improve document question answering accuracy.
Findings
High performance compared to baselines
Effective use of keyword-focused sentence sampling
Demonstrated improvement in document VQA tasks
Abstract
The Document-based Visual Question Answering competition addresses the automatic detection of parent-child relationships between elements in multi-page documents. The goal is to identify the document elements that answer a specific question posed in natural language. This paper describes the PoliTo's approach to addressing this task, in particular, our best solution explores a text-only approach, leveraging an ad hoc sampling strategy. Specifically, our approach leverages the Masked Language Modeling technique to fine-tune a BERT model, focusing on sentences containing sensitive keywords that also occur in the questions, such as references to tables or images. Thanks to the effectiveness of this approach, we are able to achieve high performance compared to baselines, demonstrating how our solution contributes positively to this task.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Attention Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Adam · WordPiece · High-Order Consensuses · Dropout · Linear Warmup With Linear Decay
