PersonalQ: Select, Quantize, and Serve Personalized Diffusion Models for Efficient Inference
Qirui Wang, Qi Guo, Yiding Sun, Junkai Yang, Dongxu Zhang, Shanmin Pang, Qing Guo

TL;DR
PersonalQ introduces a unified approach for selecting and quantizing personalized diffusion models, improving inference efficiency and intent alignment while maintaining high fidelity in personalized text-to-image generation.
Contribution
It proposes a novel framework that links checkpoint selection and quantization via trigger tokens, enhancing personalized diffusion model serving.
Findings
PersonalQ improves intent alignment over baseline retrieval methods.
Trigger-Aware Quantization (TAQ) offers better compression-quality trade-offs.
The framework enables scalable, high-fidelity serving of personalized checkpoints.
Abstract
Personalized text-to-image generation lets users fine-tune diffusion models into repositories of concept-specific checkpoints, but serving these repositories efficiently is difficult for two reasons: natural-language requests are often ambiguous and can be misrouted to visually similar checkpoints, and standard post-training quantization can distort the fragile representations that encode personalized concepts. We present PersonalQ, a unified framework that connects checkpoint selection and quantization through a shared signal -- the checkpoint's trigger token. Check-in performs intent-aligned selection by combining intent-aware hybrid retrieval with LLM-based reranking over checkpoint context and asks a brief clarification question only when multiple intents remain plausible; it then rewrites the prompt by inserting the selected checkpoint's canonical trigger. Complementing this,…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
* The personalized text-to-image model deployment is an interesting and practical topic. The proposed solution is quite practical. It is appreciated such work fill gaps between personalization model deployment and large amount of model serving. * The manuscript is well written, and easy to understand. The overall structure of the article is clear, and the author has used many charts and figures to illustrate the content. * The proposed Repo-Prompt benchmark is appreciated. It will serve as im
1. Some related references are missing, and it is suggested to consider the related work in the manuscript. * https://arxiv.org/html/2406.18820v1 * https://arxiv.org/html/2504.15298v1 2. In the experiment, the authors only consider the Stable Diffusion 1.5, how is the method's performance on other text-to-images models? 3. The Check-in part relies on large-language models for reasoning, which may introduce latency in high-concurrency scenarios. In addition, the authors define the ``Intent''
- Practical Problem: The paper tackles a highly relevant, real-world challenge. The "Check-in" methodology for handling massive repositories and the analysis of trigger token sensitivity are both practical and insightful. - New Benchmark: The proposal of the "Repo-Prompt" benchmark is a valuable contribution to the community, as it provides a standardized way to evaluate retrieval methods in this domain. - Intuitive Method: The core idea of TAQ—preserving precision for critical trigger words a
Despite the promising contribution, this paper suffers from a critical, overriding flaw in its presentation, along with several other significant technical weaknesses. 1. Fundamental Weakness: Critical Violation of Formatting Guidelines. The primary reason for the 'Presentation: 1 (Poor)' score and 'Reject' rating is a clear and severe violation of ICLR 2026 formatting policy. The submission appears to have manipulated page margins and/or horizontal spacing to fit significantly more content int
1. The paper addresses a real and practical scenario where user intent must be clarified and the most appropriate LoRAs selected to maximize alignment with user intention as well as image quality. 2. The proposed TAQ strategy appears to be a simple adjustment that enables an effective trade-off between memory constraints and generated image quality.
1. The novelty of the Check-In mechanism as a multi-stage strategy is not clear and lacks explicit comparison with the related work Stylus (Luo et al., 2024). Specifically, it is unclear what changes make the proposed method superior in quality to previous approaches. In Stylus, the authors also employ a multi-stage strategy, while in lines 405–407 it is stated that only cosine-based similarity was used to retrieve relevant LoRAs and the “Composer” stage was omitted, which in their work was also
1. The writing is clear and well organized, with figures that effectively convey the pipeline and make the method easy to follow. 2. The methodology is presented in a structured and understandable way.
1. The paper combines two loosely related parts—checkpoint selection (Check-in) and quantization (TAQ)—without providing a necessary algorithmic or conceptual connection between them. The first focuses on metadata-based retrieval and prompt reasoning, while the second focuses on bit-width compression; there is no shared signal, dependency, or experiment that couples the two. As a result, the work feels incoherent and reads as a forced stitching of two independent topics rather than a unified fra
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning
