Generative Score Inference for Multimodal Data

Xinyu Tian; Xiaotong Shen

arXiv:2603.26349·stat.ML·March 30, 2026

Generative Score Inference for Multimodal Data

Xinyu Tian, Xiaotong Shen

PDF

TL;DR

The paper introduces Generative Score Inference (GSI), a flexible framework for uncertainty quantification in multimodal data, leveraging deep generative models to improve reliability across diverse tasks.

Contribution

GSI provides a novel, generalizable inference method that constructs valid prediction and confidence sets using synthetic samples, overcoming limitations of existing approaches.

Findings

01

GSI achieves state-of-the-art hallucination detection in language models.

02

GSI provides robust uncertainty estimates in image captioning.

03

Performance improves with higher quality generative models.

Abstract

Accurate uncertainty quantification is crucial for making reliable decisions in various supervised learning scenarios, particularly when dealing with complex, multimodal data such as images and text. Current approaches often face notable limitations, including rigid assumptions and limited generalizability, constraining their effectiveness across diverse supervised learning tasks. To overcome these limitations, we introduce Generative Score Inference (GSI), a flexible inference framework capable of constructing statistically valid and informative prediction and confidence sets across a wide range of multimodal learning problems. GSI utilizes synthetic samples generated by deep generative models to approximate conditional score distributions, facilitating precise uncertainty quantification without imposing restrictive assumptions about the data or tasks. We empirically validate GSI's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.