Improving VQA Reliability: A Dual-Assessment Approach with Self-Reflection and Cross-Model Verification
Xixian Wu, Yang Ou, Pengchao Tian, Zian Yang, Jielei Zhang, Peiyi Li, Longwen Gao

TL;DR
This paper introduces DAVR, a dual-assessment framework for VQA that combines self-reflection and cross-model verification to improve answer reliability and reduce hallucinations in vision-language models.
Contribution
The paper presents a novel dual-assessment framework that integrates self-reflection and external verification to enhance VQA answer trustworthiness.
Findings
Achieved top scores in the Reliable VQA Challenge at ICCV-CLVL 2025.
Demonstrated significant improvement in answer reliability metrics.
Secured first place with a $\
Abstract
Vision-language models (VLMs) have demonstrated significant potential in Visual Question Answering (VQA). However, the susceptibility of VLMs to hallucinations can lead to overconfident yet incorrect answers, severely undermining answer reliability. To address this, we propose Dual-Assessment for VLM Reliability (DAVR), a novel framework that integrates Self-Reflection and Cross-Model Verification for comprehensive uncertainty estimation. The DAVR framework features a dual-pathway architecture: one pathway leverages dual selector modules to assess response reliability by fusing VLM latent features with QA embeddings, while the other deploys external reference models for factual cross-checking to mitigate hallucinations. Evaluated in the Reliable VQA Challenge at ICCV-CLVL 2025, DAVR achieves a leading score of 39.64 and a 100-AUC of 97.22, securing first place and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling
