Consistent estimation of generative model representations in the data kernel perspective space
Aranyak Acharyya, Michael W. Trosset, Carey E. Priebe, Hayden, S. Helm

TL;DR
This paper introduces a theoretical framework for consistently estimating and comparing generative model representations using embedding techniques, especially as the number of models and queries increase.
Contribution
It provides novel theoretical conditions ensuring consistent estimation of generative model embeddings in a growing query and model setting.
Findings
Established sufficient conditions for consistent estimation
Analyzed behavior of model embeddings as data scales
Provided insights into model comparison in high-dimensional spaces
Abstract
Generative models, such as large language models and text-to-image diffusion models, produce relevant information when presented a query. Different models may produce different information when presented the same query. As the landscape of generative models evolves, it is important to develop techniques to study and analyze differences in model behaviour. In this paper we present novel theoretical results for embedding-based representations of generative models in the context of a set of queries. In particular, we establish sufficient conditions for the consistent estimation of the model embeddings in situations where the query set and the number of models grow.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Physics and Python Applications · Distributed and Parallel Computing Systems · Scientific Computing and Data Management
MethodsSparse Evolutionary Training · Diffusion
