VIKSER: Visual Knowledge-Driven Self-Reinforcing Reasoning Framework
Chao Wang, Chunbai Zhang, Yongxiao Tian, Yang Zhou, and Yan Peng

TL;DR
VIKSER introduces a visual reasoning framework that leverages knowledge distillation, fine-grained visual knowledge, and self-reflection to improve interpretability and achieve state-of-the-art results on visual question answering datasets.
Contribution
The paper presents VIKSER, a novel framework that combines knowledge distillation, visual relationship detection, and self-reflection for enhanced visual reasoning interpretability and performance.
Findings
Achieves new state-of-the-art results on visual reasoning datasets.
Performs on par with leading proprietary models like ChatGPT-5.
Demonstrates improved interpretability through Chain-of-Evidence prompting.
Abstract
Visual reasoning refers to the task of solving questions about visual information. Current visual reasoning methods typically employ pre-trained vision-language model (VLM) strategies or deep neural network approaches. However, existing efforts are constrained by limited reasoning interpretability, while hindering by the phenomenon of underspecification in the question text. Additionally, the absence of fine-grained visual knowledge limits the precise understanding of subject behavior in visual reasoning tasks. To address these issues, we propose VIKSER (Visual Knowledge-Driven Self-Reinforcing Reasoning Framework). Specifically, VIKSER, trained using knowledge distilled from large language models, extracts fine-grained visual knowledge with the assistance of visual relationship detection techniques. Subsequently, VIKSER utilizes fine-grained visual knowledge to paraphrase the question…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Data Visualization and Analytics
