MM-Diff: High-Fidelity Image Personalization via Multi-Modal Condition   Integration

Zhichao Wei; Qingkun Su; Long Qin; Weizhi Wang

arXiv:2403.15059·cs.CV·March 25, 2024·1 cites

MM-Diff: High-Fidelity Image Personalization via Multi-Modal Condition Integration

Zhichao Wei, Qingkun Su, Long Qin, Weizhi Wang

PDF

Open Access

TL;DR

MM-Diff is a tuning-free, efficient framework that enhances personalized image generation fidelity for single and multiple subjects by integrating multi-modal embeddings through a novel cross-attention mechanism.

Contribution

It introduces a unified, tuning-free approach with a multimodal cross-attention mechanism and cross-attention map constraints for high-fidelity multi-subject image personalization.

Findings

01

Outperforms existing methods in subject fidelity and text consistency.

02

Enables flexible multi-subject image generation without predefined layouts.

03

Operates efficiently in seconds without model retraining.

Abstract

Recent advances in tuning-free personalized image generation based on diffusion models are impressive. However, to improve subject fidelity, existing methods either retrain the diffusion model or infuse it with dense visual embeddings, both of which suffer from poor generalization and efficiency. Also, these methods falter in multi-subject image generation due to the unconstrained cross-attention mechanism. In this paper, we propose MM-Diff, a unified and tuning-free image personalization framework capable of generating high-fidelity images of both single and multiple subjects in seconds. Specifically, to simultaneously enhance text consistency and subject fidelity, MM-Diff employs a vision encoder to transform the input image into CLS and patch embeddings. CLS embeddings are used on the one hand to augment the text embeddings, and on the other hand together with patch embeddings to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMedical Image Segmentation Techniques · AI in cancer detection · Advanced Data Compression Techniques

MethodsDiffusion