Proof-of-Perception: Certified Tool-Using Multimodal Reasoning with Compositional Conformal Guarantees

Arya Fayyazi; Haleh Akrami

arXiv:2603.00324·cs.CV·March 3, 2026

Proof-of-Perception: Certified Tool-Using Multimodal Reasoning with Compositional Conformal Guarantees

Arya Fayyazi, Haleh Akrami

PDF

Open Access

TL;DR

Proof-of-Perception (PoP) introduces a framework for multimodal reasoning that provides explicit reliability guarantees through conformal sets, enabling more accurate, reliable, and efficient AI reasoning with tool use.

Contribution

PoP is the first framework to integrate conformal guarantees into multimodal reasoning, allowing verifiable evidence and controlled computation in tool-using AI systems.

Findings

01

Improves performance over chain-of-thought and ReAct baselines

02

Reduces error propagation and hallucinations

03

Enhances reliability with stepwise uncertainty estimates

Abstract

We present Proof-of-Perception (PoP), a tool-using framework that casts multimodal reasoning as an executable graph with explicit reliability guarantees. Each perception or logic node outputs a conformal set, yielding calibrated, stepwise uncertainty; a lightweight controller uses these certificates to allocate compute under a budget, expanding with extra tool calls only when needed and stopping early otherwise. This grounds answers in verifiable evidence, reduces error compounding and hallucinations, and enables principled accuracy-compute trade-offs. Across document, chart, and multi-image QA benchmarks, PoP improves performance and reliability over strong chain-of-thought, ReAct-style, and program-of-thought baselines while using computation more efficiently.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Logic, Reasoning, and Knowledge · Explainable Artificial Intelligence (XAI)