Proof-of-Perception: Certified Tool-Using Multimodal Reasoning with Compositional Conformal Guarantees
Arya Fayyazi, Haleh Akrami

TL;DR
Proof-of-Perception (PoP) introduces a framework for multimodal reasoning that provides explicit reliability guarantees through conformal sets, enabling more accurate, reliable, and efficient AI reasoning with tool use.
Contribution
PoP is the first framework to integrate conformal guarantees into multimodal reasoning, allowing verifiable evidence and controlled computation in tool-using AI systems.
Findings
Improves performance over chain-of-thought and ReAct baselines
Reduces error propagation and hallucinations
Enhances reliability with stepwise uncertainty estimates
Abstract
We present Proof-of-Perception (PoP), a tool-using framework that casts multimodal reasoning as an executable graph with explicit reliability guarantees. Each perception or logic node outputs a conformal set, yielding calibrated, stepwise uncertainty; a lightweight controller uses these certificates to allocate compute under a budget, expanding with extra tool calls only when needed and stopping early otherwise. This grounds answers in verifiable evidence, reduces error compounding and hallucinations, and enables principled accuracy-compute trade-offs. Across document, chart, and multi-image QA benchmarks, PoP improves performance and reliability over strong chain-of-thought, ReAct-style, and program-of-thought baselines while using computation more efficiently.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Logic, Reasoning, and Knowledge · Explainable Artificial Intelligence (XAI)
