Concept-RuleNet: Grounded Multi-Agent Neurosymbolic Reasoning in Vision Language Models
Sanchit Sinha, Guangzhi Xiong, Zhenghao He, Aidong Zhang

TL;DR
Concept-RuleNet introduces a multi-agent neurosymbolic reasoning system that grounds visual concepts in real data, enabling interpretable decision pathways and reducing hallucinations in vision-language models.
Contribution
It proposes a novel multi-agent framework that grounds symbols in visual data and combines them with language reasoning for transparent, accurate predictions.
Findings
Improves neurosymbolic baseline accuracy by 5% on benchmarks.
Reduces hallucinated symbols in rules by up to 50%.
Effective on medical and underrepresented natural image datasets.
Abstract
Modern vision-language models (VLMs) deliver impressive predictive accuracy yet offer little insight into 'why' a decision is reached, frequently hallucinating facts, particularly when encountering out-of-distribution data. Neurosymbolic frameworks address this by pairing black-box perception with interpretable symbolic reasoning, but current methods extract their symbols solely from task labels, leaving them weakly grounded in the underlying visual data. In this paper, we introduce a multi-agent system - Concept-RuleNet that reinstates visual grounding while retaining transparent reasoning. Specifically, a multimodal concept generator first mines discriminative visual concepts directly from a representative subset of training images. Next, these visual concepts are utilized to condition symbol discovery, anchoring the generations in real image statistics and mitigating label bias.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Machine Learning in Healthcare
