Efficient Cross-Lingual Transfer for Chinese Stable Diffusion with   Images as Pivots

Jinyi Hu; Xu Han; Xiaoyuan Yi; Yutong Chen; Wenhao Li; Zhiyuan Liu,; Maosong Sun

arXiv:2305.11540·cs.CV·May 22, 2023·1 cites

Efficient Cross-Lingual Transfer for Chinese Stable Diffusion with Images as Pivots

Jinyi Hu, Xu Han, Xiaoyuan Yi, Yutong Chen, Wenhao Li, Zhiyuan Liu,, Maosong Sun

PDF

Open Access

TL;DR

This paper introduces IAP, a novel method that efficiently transfers English Stable Diffusion to Chinese by aligning Chinese semantics with English in CLIP space using images as pivots, requiring minimal training data.

Contribution

IAP is a simple, effective approach that leverages images as pivots to align Chinese and English semantics in CLIP, enabling cross-lingual diffusion without extensive retraining.

Findings

01

Outperforms strong Chinese diffusion models with only 5-10% training data

02

Establishes efficient connections between Chinese, English, and visual semantics in CLIP

03

Improves image generation quality with direct Chinese prompts

Abstract

Diffusion models have made impressive progress in text-to-image synthesis. However, training such large-scale models (e.g. Stable Diffusion), from scratch requires high computational costs and massive high-quality text-image pairs, which becomes unaffordable in other languages. To handle this challenge, we propose IAP, a simple but effective method to transfer English Stable Diffusion into Chinese. IAP optimizes only a separate Chinese text encoder with all other parameters fixed to align Chinese semantics space to the English one in CLIP. To achieve this, we innovatively treat images as pivots and minimize the distance of attentive features produced from cross-attention between images and each language respectively. In this way, IAP establishes connections of Chinese, English and visual semantics in CLIP's embedding space efficiently, advancing the quality of the generated image with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis

MethodsContrastive Language-Image Pre-training · Diffusion · ALIGN