Mitigating Semantic Collapse in Generative Personalization with Test-Time Embedding Adjustment

Anh Bui; Trang Vu; Trung Le; Junae Kim; Tamas Abraham; Rollin Omari; Amar Kaur; Dinh Phung

arXiv:2506.22685·cs.LG·February 26, 2026

Mitigating Semantic Collapse in Generative Personalization with Test-Time Embedding Adjustment

Anh Bui, Trang Vu, Trung Le, Junae Kim, Tamas Abraham, Rollin Omari, Amar Kaur, Dinh Phung

PDF

Open Access 3 Reviews

TL;DR

This paper addresses the semantic collapse problem in generative personalization by proposing a training-free inference-time embedding adjustment method that preserves the original meaning of learned visual concepts.

Contribution

The authors introduce a novel, training-free approach to adjust embeddings at inference time, effectively mitigating semantic collapse in personalized generative models.

Findings

01

Significant improvement in text-image alignment across various personalization methods.

02

The proposed method is simple, training-free, and broadly applicable.

03

Effective in maintaining semantic richness and diversity in generated images.

Abstract

In this paper, we investigate the semantic collapsing problem in generative personalization, an under-explored topic where the learned visual concept ( $V$ ) gradually shifts from its original textual meaning and comes to dominate other concepts in multi-concept input prompts. This issue not only reduces the semantic richness of complex input prompts like "a photo of $V$ wearing glasses and playing guitar" into simpler, less contextually rich forms such as "a photo of $V$ " but also leads to simplified output images that fail to capture the intended concept. We identify the root cause as unconstrained optimisation, which allows the learned embedding $V$ to drift arbitrarily in the embedding space, both in direction and magnitude. To address this, we propose a simple yet effective training-free method that adjusts the magnitude and direction of pre-trained embedding at inference time,…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 4

Strengths

The work proposes the Semantic Collapsing Problem (SCP) in generative personalization—an under-explored issue—and rigorously identifying unconstrained optimization as its root cause, with solid empirical evidence across textual and image spaces. The proposed TEA method is lightweight and practical: it requires no additional training, avoids modifying model weights, and generalizes well across diverse frameworks (e.g., Textual Inversion, DreamBooth) and architectures (Stable Diffusion, Flux), mak

Weaknesses

TEA relies on fixed hyperparameters (α=0.2, β=1.5) across all prompts, which may not be optimal for diverse scenario.

Reviewer 02Rating 4Confidence 3

Strengths

1. High quality of writing and presentation. 2. The introduction of TEA, a method that is training-free and easily transferable. 3. An insightful analysis of the "Semantic Collapsing Problem" within generative personalization and the mechanics of anti-dreambooth methods.

Weaknesses

1. The paper lacks comparisons to recent, mainstream works, particularly those in the in-context generation paradigm (e.g., OminiControl[1], FLUX.1 Kontext[2], and Diffusion Self-Distillation[3]). 2. The evaluation is insufficient. It should be strengthened by including MLLM-based benchmarks, such as DreamBench++[4]. 3. Table 1 shows a performance decrease in Reference and Image alignment scores. This is presumably because TEA's objective focuses exclusively on fidelity between the learned and o

Reviewer 03Rating 4Confidence 4

Strengths

The identification and empirical analysis of SCP are interesting. TEA is lightweight, requires no retraining, and is compatible with numerous existing frameworks. The paper is well-structured and easy to follow.

Weaknesses

While the Test-time Embedding Adjustment (TEA) method is practical and easy to deploy, its technical contribution is relatively modest. The core mechanism—adjusting the magnitude and direction of an embedding vector—is a straightforward application of existing vector space operations, lacking the novelty of a more transformative technique. The approach does not introduce new learning paradigms or architectural innovations, but rather applies a post-hoc correction to the outputs of existing model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTechnology Use by Older Adults