TRACER: Verifiable Generative Provenance for Multimodal Tool-Using Agents
Bihui Yu,Caijun Jia,Jing Chi,Xiaohan Liu,Yining Wang,He Bai,Yuchen Liu,Jingxuan Wei,Junnan Zhu

TL;DR
TRACER introduces a framework for verifiable provenance in multimodal tool-using agents, enabling structured, claim-level evidence tracking to improve reasoning transparency and accuracy.
Contribution
It proposes a novel provenance generation and verification method that enhances transparency and reinforcement learning in multimodal tool-using agents.
Findings
TRACER achieves 78.23% answer accuracy on TRACE-Bench.
It outperforms baseline by 23.80 percentage points.
Reduces total tool calls from 4949 to 3486.
Abstract
Multimodal large language models increasingly solve vision-centric tasks by calling external tools for visual inspection, OCR, retrieval, calculation, and multi-step reasoning. Current tool-using agents usually expose the executed tool trajectory and the final answer, but they rarely specify which tool observation supports each generated claim. We call this missing claim-level dependency structure the provenance gap. The gap makes tool use hard to verify and hard to optimize, because useful evidence, redundant exploration, and unsupported reasoning are mixed in the same trajectory. We introduce TRACER, a framework for verifiable generative provenance in multimodal tool-using agents. Instead of adding citations after generation, TRACER generates each answer sentence together with a structured provenance record that identifies the supporting tool turn, evidence unit, and semantic support…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
