OmniTrace: A Unified Framework for Generation-Time Attribution in Omni-Modal LLMs
Qianqi Yan, Yichen Guo, Ching-Chen Kuo, Shan Jiang, Hang Yin, Yang Zhao, Xin Eric Wang

TL;DR
OmniTrace is a versatile, model-agnostic framework that offers generation-time attribution for multimodal large language models, enabling coherent, span-level explanations during open-ended multimodal generation.
Contribution
It introduces a novel, unified protocol for attribution as a generation-time tracing problem, applicable across various signals and modalities without retraining.
Findings
Generation-aware span-level attribution yields more stable explanations.
OmniTrace outperforms naive self-attribution and embedding baselines.
Framework is effective across visual, audio, and video tasks.
Abstract
Modern multimodal large language models (MLLMs) generate fluent responses from interleaved text, image, audio, and video inputs. However, identifying which input sources support each generated statement remains an open challenge. Existing attribution methods are primarily designed for classification settings, fixed prediction targets, or single-modality architectures, and do not naturally extend to autoregressive, decoder-only models performing open-ended multimodal generation. We introduce OmniTrace, a lightweight and model-agnostic framework that formalizes attribution as a generation-time tracing problem over the causal decoding process. OmniTrace provides a unified protocol that converts arbitrary token-level signals such as attention weights or gradient-based scores into coherent span-level, cross-modal explanations during decoding. It traces each generated token to multimodal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
