MGA-VQA: Secure and Interpretable Graph-Augmented Visual Question Answering with Memory-Guided Protection Against Unauthorized Knowledge Use

Ahmad Mohammadshirazi; Pinaki Prasad Guha Neogi; Dheeraj Kulshrestha; Rajiv Ramnath

arXiv:2511.17881·cs.CV·November 25, 2025

MGA-VQA: Secure and Interpretable Graph-Augmented Visual Question Answering with Memory-Guided Protection Against Unauthorized Knowledge Use

Ahmad Mohammadshirazi, Pinaki Prasad Guha Neogi, Dheeraj Kulshrestha, Rajiv Ramnath

PDF

Open Access

TL;DR

MGA-VQA is a novel multi-modal framework for Document Visual Question Answering that enhances interpretability, reasoning, and efficiency by integrating graph reasoning, memory, and question-guided compression, outperforming prior models across multiple benchmarks.

Contribution

It introduces a graph-augmented, memory-guided approach for DocVQA that improves interpretability and reasoning capabilities over existing black-box models.

Findings

01

Achieves superior accuracy on six DocVQA benchmarks.

02

Enhances interpretability with graph-based decision pathways.

03

Demonstrates improved efficiency in high-resolution document processing.

Abstract

Document Visual Question Answering (DocVQA) requires models to jointly understand textual semantics, spatial layout, and visual features. Current methods struggle with explicit spatial relationship modeling, inefficiency with high-resolution documents, multi-hop reasoning, and limited interpretability. We propose MGA-VQA, a multi-modal framework that integrates token-level encoding, spatial graph reasoning, memory-augmented inference, and question-guided compression. Unlike prior black-box models, MGA-VQA introduces interpretable graph-based decision pathways and structured memory access for enhanced reasoning transparency. Evaluation across six benchmarks (FUNSD, CORD, SROIE, DocVQA, STE-VQA, and RICO) demonstrates superior accuracy and efficiency, with consistent improvements in both answer prediction and spatial localization.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Graph Neural Networks · Topic Modeling