Improving VQA Reliability: A Dual-Assessment Approach with Self-Reflection and Cross-Model Verification

Xixian Wu; Yang Ou; Pengchao Tian; Zian Yang; Jielei Zhang; Peiyi Li; Longwen Gao

arXiv:2512.14770·cs.CV·December 18, 2025

Improving VQA Reliability: A Dual-Assessment Approach with Self-Reflection and Cross-Model Verification

Xixian Wu, Yang Ou, Pengchao Tian, Zian Yang, Jielei Zhang, Peiyi Li, Longwen Gao

PDF

Open Access

TL;DR

This paper introduces DAVR, a dual-assessment framework for VQA that combines self-reflection and cross-model verification to improve answer reliability and reduce hallucinations in vision-language models.

Contribution

The paper presents a novel dual-assessment framework that integrates self-reflection and external verification to enhance VQA answer trustworthiness.

Findings

01

Achieved top scores in the Reliable VQA Challenge at ICCV-CLVL 2025.

02

Demonstrated significant improvement in answer reliability metrics.

03

Secured first place with a $\

Abstract

Vision-language models (VLMs) have demonstrated significant potential in Visual Question Answering (VQA). However, the susceptibility of VLMs to hallucinations can lead to overconfident yet incorrect answers, severely undermining answer reliability. To address this, we propose Dual-Assessment for VLM Reliability (DAVR), a novel framework that integrates Self-Reflection and Cross-Model Verification for comprehensive uncertainty estimation. The DAVR framework features a dual-pathway architecture: one pathway leverages dual selector modules to assess response reliability by fusing VLM latent features with QA embeddings, while the other deploys external reference models for factual cross-checking to mitigate hallucinations. Evaluated in the Reliable VQA Challenge at ICCV-CLVL 2025, DAVR achieves a leading $Φ_{100}$ score of 39.64 and a 100-AUC of 97.22, securing first place and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling