Neural Concept Verifier: Scaling Prover-Verifier Games via Concept Encodings

Berkant Turan; Suhrab Asadulla; David Steinmann; Kristian Kersting; Wolfgang Stammer; Sebastian Pokutta

arXiv:2507.07532·cs.LG·February 5, 2026

Neural Concept Verifier: Scaling Prover-Verifier Games via Concept Encodings

Berkant Turan, Suhrab Asadulla, David Steinmann, Kristian Kersting, Wolfgang Stammer, Sebastian Pokutta

PDF

3 Reviews

TL;DR

The paper introduces Neural Concept Verifier (NCV), a framework that combines prover-verifier games with concept encodings to enable verifiable, interpretable AI on complex high-dimensional data like images.

Contribution

NCV integrates PVGs with concept encodings and minimally supervised concept discovery to improve verifiability and interpretability in high-dimensional data classification.

Findings

01

NCV outperforms classic concept-based models on complex datasets.

02

NCV mitigates shortcut learning in high-dimensional inputs.

03

NCV demonstrates effective concept-level verification for AI models.

Abstract

While Prover-Verifier Games (PVGs) offer a promising path toward verifiability in nonlinear classification models, they have not yet been applied to complex inputs such as high-dimensional images. Conversely, expressive concept encodings effectively allow to translate such data into interpretable concepts but are often utilised in the context of low-capacity linear predictors. In this work, we push towards real-world verifiability by combining the strengths of both approaches. We introduce Neural Concept Verifier (NCV), a unified framework combining PVGs for formal verifiability with concept encodings to handle complex, high-dimensional inputs in an interpretable way. NCV achieves this by utilizing recent minimally supervised concept discovery models to extract structured concept encodings from raw inputs. A prover then selects a subset of these encodings, which a verifier, implemented…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 4

Strengths

The central idea, combining PVGs and CBMs, is intriguing The focus on verifiability is timely and relevant for trustworthy AI. The experimental setup is diverse, covering both synthetic and real-world datasets. The exposition is generally clear and well organized, though sometimes too high-level.

Weaknesses

**Lack of comparison with relevant recent models** The paper compares only against plain CBMs and pixel-based PVGs. However, there has been a recent surge in models that incorporate interpretable yet nonlinear mappings—often grounded in logic or symbolic reasoning (e.g., [1], [2]). In particular, Debot et al. maintain a global logic-based task decoder that allows the use of theorem provers to provide formal proofs of desired logical properties. These works appear conceptually close to NCV, yet

Reviewer 02Rating 6Confidence 4

Strengths

S1: this work is well-motivated by the gap of sacrificing accuracy for interpretability or vice versa in current literature, and the results generally supported all the claims stated. S2: the framework is designed to be generalizable -- it allows integration of different concept extractors and verifiers, showing the potential for future studies on this framework. S3: scalability is studied on real-world image datasets such as ImageNet-1k. S4: robustness for shortcuts is studied and discussed,

Weaknesses

W1: sometimes it is not clear to the reader what is the difference between NCV and MAC as in section 3 it seems like most of NCV is just adapting MAC. The authors should clarify and emphasize on their innovations beyond MAC (for example, there are a couple places where the authors states ".... shifting the PVG to concept encodings ..." (line 343), it can be made more clear if the authors mention the original set up of PVG such as "... shifting the PVG from xxxxxx to concept encoding ..." to high

Reviewer 03Rating 6Confidence 3

Strengths

1. Framing prediction as a PVG on a concept bottleneck is an interesting approach to the problem of shortcut learning. 2. PVGs offer a different and more stable spin on adversarial training by shifting the adversarial intervention to the concept space instead of the input space. 3. The proposed method provides good semantic explanations without compromising predictive performance, which is a key challenge in interpretable ML. 4. The paper is easy to follow.

Weaknesses

1. The claim that CBMs are "limited by their reliance on low-capacity linear predictors" is not quite fair; CBMs can easily be combined with non-linear predictors, so this is not really a serious limitation that this paper resolves. 2. The first 3 benchmarks in Table 1 are missing a key baseline, namely, CBM on the CLIP-Sim feature space with a nonlinear probe. This is to test whether the PVG approach has a distinct performance edge over simply applying CBMs on a rich concept vocabulary with no

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.