VIRO: Robust and Efficient Neuro-Symbolic Reasoning with Verification for Referring Expression Comprehension

Hyejin Park; Junhyuk Kwon; Suha Kwak; Jungseul Ok

arXiv:2601.12781·cs.AI·March 23, 2026

VIRO: Robust and Efficient Neuro-Symbolic Reasoning with Verification for Referring Expression Comprehension

Hyejin Park, Junhyuk Kwon, Suha Kwak, Jungseul Ok

PDF

Open Access

TL;DR

VIRO introduces verification-integrated reasoning operators in neuro-symbolic models for referring expression comprehension, significantly improving robustness and accuracy, especially in no-target scenarios, while maintaining efficiency and scalability.

Contribution

The paper proposes VIRO, a novel neuro-symbolic framework with embedded verifiers that enhance robustness and reduce cascading errors in referring expression comprehension.

Findings

01

Achieves 61.1% balanced accuracy in target and no-target detection.

02

Demonstrates low program failure rate of 0.3%.

03

Generalizes effectively to real-world egocentric data.

Abstract

Referring Expression Comprehension (REC) aims to localize the image region corresponding to a natural language query. Recent neuro-symbolic REC approaches leverage large language models (LLMs) and vision-language models (VLMs) to perform compositional reasoning, decomposing queries into structured programs and executing them step-by-step. While such approaches achieve interpretable reasoning and strong zero-shot generalization, they assume that intermediate reasoning steps are accurate. However, this assumption causes cascading errors: false detections and invalid relations propagate through the reasoning chain, yielding high-confidence false positives even when no target is present in the image. To address this limitation, we introduce Verification-Integrated Reasoning Operators (VIRO), a neuro-symbolic framework that embeds lightweight operator-level verifiers within reasoning steps.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Ferroelectric and Negative Capacitance Devices