AutoVQA-G: Self-Improving Agentic Framework for Automated Visual Question Answering and Grounding Annotation

Rongsheng Hu; Runwei Guan; Yicheng Di; Jiayu Bao; Yuan Liu

arXiv:2604.17488·cs.CV·April 21, 2026

AutoVQA-G: Self-Improving Agentic Framework for Automated Visual Question Answering and Grounding Annotation

Rongsheng Hu, Runwei Guan, Yicheng Di, Jiayu Bao, Yuan Liu

PDF

1 Repo

TL;DR

AutoVQA-G is a novel self-improving framework that automates high-quality visual question answering and grounding annotation through iterative refinement and reasoning, enhancing dataset fidelity for vision-language models.

Contribution

It introduces an iterative, self-improving agentic framework employing Chain-of-Thought reasoning and prompt optimization to improve automated VQA-G dataset quality.

Findings

01

AutoVQA-G outperforms existing methods in visual grounding accuracy.

02

The framework effectively refines annotations through feedback and reasoning.

03

Generated datasets facilitate more robust vision-language model training.

Abstract

Manual annotation of high-quality visual question answering with grounding (VQA-G) datasets, which pair visual questions with evidential grounding, is crucial for advancing vision-language models (VLMs), but remains unscalable. Existing automated methods are often hindered by two key issues: (1) inconsistent data fidelity due to model hallucinations; (2) brittle verification mechanisms based on simple heuristics. To address these limitations, we introduce AutoVQA-G, a self-improving agentic framework for automated VQA-G annotation. AutoVQA-G employs an iterative refinement loop where a Consistency Evaluation module uses Chain-of-Thought (CoT) reasoning for fine-grained visual verification. Based on this feedback, a memory-augmented Prompt Optimization agent analyzes critiques from failed samples to progressively refine generation prompts. Our experiments show that AutoVQA-G generates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rohnson1999/AutoVQA-G
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.