Consistent estimation of generative model representations in the data   kernel perspective space

Aranyak Acharyya; Michael W. Trosset; Carey E. Priebe; Hayden; S. Helm

arXiv:2409.17308·cs.LG·January 20, 2025

Consistent estimation of generative model representations in the data kernel perspective space

Aranyak Acharyya, Michael W. Trosset, Carey E. Priebe, Hayden, S. Helm

PDF

Open Access

TL;DR

This paper introduces a theoretical framework for consistently estimating and comparing generative model representations using embedding techniques, especially as the number of models and queries increase.

Contribution

It provides novel theoretical conditions ensuring consistent estimation of generative model embeddings in a growing query and model setting.

Findings

01

Established sufficient conditions for consistent estimation

02

Analyzed behavior of model embeddings as data scales

03

Provided insights into model comparison in high-dimensional spaces

Abstract

Generative models, such as large language models and text-to-image diffusion models, produce relevant information when presented a query. Different models may produce different information when presented the same query. As the landscape of generative models evolves, it is important to develop techniques to study and analyze differences in model behaviour. In this paper we present novel theoretical results for embedding-based representations of generative models in the context of a set of queries. In particular, we establish sufficient conditions for the consistent estimation of the model embeddings in situations where the query set and the number of models grow.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational Physics and Python Applications · Distributed and Parallel Computing Systems · Scientific Computing and Data Management

MethodsSparse Evolutionary Training · Diffusion