Representative Language Generation

Charlotte Peale; Vinod Raman; Omer Reingold

arXiv:2505.21819·cs.CL·May 29, 2025

Representative Language Generation

Charlotte Peale, Vinod Raman, Omer Reingold

PDF

Open Access 1 Video

TL;DR

This paper extends the theoretical framework of language generation to include diversity and bias considerations, proposing a formal notion of representative generation that ensures outputs proportionally reflect training data groups.

Contribution

It introduces the concept of representative generation, formalizes the group closure dimension, and analyzes both the feasibility and limitations of such models in theory and computation.

Findings

01

Feasibility of representative generation for infinite hypothesis classes.

02

Negative computability result using only membership queries.

03

Provides a rigorous foundation for diverse and representative generative models.

Abstract

We introduce "representative generation," extending the theoretical framework for generation proposed by Kleinberg et al. (2024) and formalized by Li et al. (2024), to additionally address diversity and bias concerns in generative models. Our notion requires outputs of a generative model to proportionally represent groups of interest from the training data. We characterize representative uniform and non-uniform generation, introducing the "group closure dimension" as a key combinatorial quantity. For representative generation in the limit, we analyze both information-theoretic and computational aspects, demonstrating feasibility for countably infinite hypothesis classes and collections of groups under certain conditions, but proving a negative result for computability using only membership queries. This contrasts with Kleinberg et al.'s (2024) positive results for standard generation in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Representative Language Generation· slideslive

Taxonomy

TopicsMachine Learning and Algorithms · Generative Adversarial Networks and Image Synthesis · Computability, Logic, AI Algorithms