Can VLMs Reason Robustly? A Neuro-Symbolic Investigation
Weixin Chen, Antonio Vergari, Han Zhao

TL;DR
This paper investigates the robustness of vision-language models in reasoning tasks under distribution shifts, revealing limitations of current approaches and proposing a neuro-symbolic method, VLC, that enhances reasoning robustness across diverse tasks.
Contribution
The paper introduces VLC, a neuro-symbolic approach combining VLM-based concept recognition with circuit-based symbolic reasoning to improve robustness under distribution shifts.
Findings
VLMs perform well in-distribution but poorly under covariate shifts.
Neuro-symbolic approaches with black-box reasoning show inconsistent robustness.
VLC achieves strong, consistent performance across multiple reasoning tasks.
Abstract
Vision-Language Models (VLMs) have been applied to a wide range of reasoning tasks, yet it remains unclear whether they can reason robustly under distribution shifts. In this paper, we study covariate shifts in which the perceptual input distribution changes while the underlying prediction rules do not. To investigate this question, we consider visual deductive reasoning tasks, where a model is required to answer a query given an image and logical rules defined over the object concepts in the image. Empirically, we find that VLMs fine-tuned through gradient-based end-to-end training can achieve high in-distribution accuracy but fail to generalize under such shifts, suggesting that fine-tuning does not reliably induce the underlying reasoning function. This motivates a neuro-symbolic perspective that decouples perception from reasoning. However, we further observe that recent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Domain Adaptation and Few-Shot Learning
