Diversify, Rationalize, and Combine: Ensembling Multiple QA Strategies   for Zero-shot Knowledge-based VQA

Miaoyu Li; Haoxin Li; Zilin Du; and Boyang Li

arXiv:2406.12746·cs.CL·October 10, 2024

Diversify, Rationalize, and Combine: Ensembling Multiple QA Strategies for Zero-shot Knowledge-based VQA

Miaoyu Li, Haoxin Li, Zilin Du, and Boyang Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces DietCoke, a multi-strategy ensemble framework for knowledge-based visual question answering that combines diverse reasoning tactics and rationales to improve accuracy over existing methods.

Contribution

It proposes a novel three-stage ensemble approach that diversifies question-answering strategies, generates rationales, and intelligently combines answers, advancing zero-shot K-VQA performance.

Findings

01

Outperforms state-of-the-art baselines by 2.8% on OK-VOA and 4.7% on A-OKVOA.

02

Demonstrates high complementarity among ensemble strategies.

03

Validates effectiveness of rationales in answer selection.

Abstract

Knowledge-based Visual Question-answering (K-VQA) often requires the use of background knowledge beyond the image. However, we discover that a single knowledge generation strategy is often insufficient for all K-VQA questions. To this end, we propose Diversification, Evidence Truncation, and Combination for Knowledge-based Elucidation (DietCoke), which utilizes a bundle of complementary question-answering tactics and aggregates their answers using textual rationales. DietCoke comprises of three stages: diversification, rationalization, and ensemble. The diversification stage generates three distinctive decision contexts, each leading to its own answer candidate. The rationalization stage generates two rationales, the automatic rationale and the mechanistic rationale, for each answer candidate using decorrelated techniques. Finally, in the ensemble stage, an LLM informed by the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

limiaoyu/dietcoke
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIndustrial Vision Systems and Defect Detection