VietMEAgent: Culturally-Aware Few-Shot Multimodal Explanation for Vietnamese Visual Question Answering

Hai-Dang Nguyen; Minh-Anh Dang; Minh-Tan Le; Minh-Tuan Le

arXiv:2511.09058·cs.CV·November 13, 2025

VietMEAgent: Culturally-Aware Few-Shot Multimodal Explanation for Vietnamese Visual Question Answering

Hai-Dang Nguyen, Minh-Anh Dang, Minh-Tan Le, Minh-Tuan Le

PDF

Open Access 3 Datasets

TL;DR

VietMEAgent is a multimodal, explainable VQA system tailored for Vietnamese culture, integrating cultural knowledge and structured explanations to improve interpretability and cultural understanding in AI.

Contribution

The paper introduces VietMEAgent, a novel culturally-aware VQA framework with a dedicated Vietnamese cultural knowledge base and explainability modules, addressing cultural specificity and interpretability challenges.

Findings

01

Effective cultural object detection in Vietnamese images

02

Generation of transparent, human-readable explanations

03

Demonstrated on a new Vietnamese Cultural VQA dataset

Abstract

Contemporary Visual Question Answering (VQA) systems remain constrained when confronted with culturally specific content, largely because cultural knowledge is under-represented in training corpora and the reasoning process is not rendered interpretable to end users. This paper introduces VietMEAgent, a multimodal explainable framework engineered for Vietnamese cultural understanding. The method integrates a cultural object detection backbone with a structured program generation layer, yielding a pipeline in which answer prediction and explanation are tightly coupled. A curated knowledge base of Vietnamese cultural entities serves as an explicit source of background information, while a dual-modality explanation module combines attention-based visual evidence with structured, human-readable textual rationales. We further construct a Vietnamese Cultural VQA dataset sourced from public…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning