Concept-RuleNet: Grounded Multi-Agent Neurosymbolic Reasoning in Vision Language Models

Sanchit Sinha; Guangzhi Xiong; Zhenghao He; Aidong Zhang

arXiv:2511.11751·cs.CV·November 18, 2025

Concept-RuleNet: Grounded Multi-Agent Neurosymbolic Reasoning in Vision Language Models

Sanchit Sinha, Guangzhi Xiong, Zhenghao He, Aidong Zhang

PDF

Open Access 1 Video

TL;DR

Concept-RuleNet introduces a multi-agent neurosymbolic reasoning system that grounds visual concepts in real data, enabling interpretable decision pathways and reducing hallucinations in vision-language models.

Contribution

It proposes a novel multi-agent framework that grounds symbols in visual data and combines them with language reasoning for transparent, accurate predictions.

Findings

01

Improves neurosymbolic baseline accuracy by 5% on benchmarks.

02

Reduces hallucinated symbols in rules by up to 50%.

03

Effective on medical and underrepresented natural image datasets.

Abstract

Modern vision-language models (VLMs) deliver impressive predictive accuracy yet offer little insight into 'why' a decision is reached, frequently hallucinating facts, particularly when encountering out-of-distribution data. Neurosymbolic frameworks address this by pairing black-box perception with interpretable symbolic reasoning, but current methods extract their symbols solely from task labels, leaving them weakly grounded in the underlying visual data. In this paper, we introduce a multi-agent system - Concept-RuleNet that reinstates visual grounding while retaining transparent reasoning. Specifically, a multimodal concept generator first mines discriminative visual concepts directly from a representative subset of training images. Next, these visual concepts are utilized to condition symbol discovery, anchoring the generations in real image statistics and mitigating label bias.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Concept-RuleNet: Grounded Multi-Agent Neurosymbolic Reasoning in Vision Language Models· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Machine Learning in Healthcare