A Theory for Conditional Generative Modeling on Multiple Data Sources

Rongzhen Wang; Yan Zhang; Chenyu Zheng; Chongxuan Li; Guoqiang Wu

arXiv:2502.14583·cs.LG·July 9, 2025

A Theory for Conditional Generative Modeling on Multiple Data Sources

Rongzhen Wang, Yan Zhang, Chenyu Zheng, Chongxuan Li, Guoqiang Wu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper provides a theoretical framework for understanding how multi-source data improves conditional generative models, showing that shared similarities among sources lead to sharper estimation bounds and better performance.

Contribution

It introduces the first rigorous analysis of multi-source training in conditional generative modeling, establishing error bounds and characterizing benefits of source similarity.

Findings

01

Multi-source training can outperform single-source under shared source similarities.

02

Theoretical bounds depend on the number of sources and their distribution similarities.

03

Experiments validate the theoretical advantages of multi-source over single-source training.

Abstract

The success of large generative models has driven a paradigm shift, leveraging massive multi-source data to enhance model capabilities. However, the interaction among these sources remains theoretically underexplored. This paper takes the first step toward a rigorous analysis of multi-source training in conditional generative modeling, where each condition represents a distinct data source. Specifically, we establish a general distribution estimation error bound in average total variation distance for conditional maximum likelihood estimation based on the bracketing number. Our result shows that when source distributions share certain similarities and the model is expressive enough, multi-source training guarantees a sharper bound than single-source training. We further instantiate the general theory on conditional Gaussian estimation and deep generative models including autoregressive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ml-gsai/multi-source-gm
pytorchOfficial

Videos

A Theory for Conditional Generative Modeling on Multiple Data Sources· slideslive

Taxonomy

TopicsAdvanced Database Systems and Queries