HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical   MLLM Prompts

Xinyu Liu; Yingqing He; Lanqing Guo; Xiang Li; Bu Jin; Peng Li; Yan; Li; Chi-Min Chan; Qifeng Chen; Wei Xue; Wenhan Luo; Qifeng Liu; Yike Guo

arXiv:2409.02919·cs.CV·September 10, 2024

HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts

Xinyu Liu, Yingqing He, Lanqing Guo, Xiang Li, Bu Jin, Peng Li, Yan, Li, Chi-Min Chan, Qifeng Chen, Wei Xue, Wenhan Luo, Qifeng Liu, Yike Guo

PDF

Open Access 1 Repo

TL;DR

HiPrompt introduces a hierarchical, tuning-free prompting method for high-resolution image generation that combines global and local guidance to improve structural quality and reduce artifacts in 4K and higher resolutions.

Contribution

The paper proposes HiPrompt, a novel hierarchical prompting framework that enhances high-resolution image generation without additional tuning, effectively addressing object repetition and structural artifacts.

Findings

01

Outperforms state-of-the-art methods in 4K image generation

02

Reduces object repetition and structural artifacts

03

Maintains coherent local and global semantics

Abstract

The potential for higher-resolution image generation using pretrained diffusion models is immense, yet these models often struggle with issues of object repetition and structural artifacts especially when scaling to 4K resolution and higher. We figure out that the problem is caused by that, a single prompt for the generation of multiple scales provides insufficient efficacy. In response, we propose HiPrompt, a new tuning-free solution that tackles the above problems by introducing hierarchical prompts. The hierarchical prompts offer both global and local guidance. Specifically, the global guidance comes from the user input that describes the overall content, while the local guidance utilizes patch-wise descriptions from MLLMs to elaborately guide the regional structure and texture generation. Furthermore, during the inverse denoising process, the generated noise is decomposed into low-…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Liuxinyv/HiPrompt
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Advanced Vision and Imaging

MethodsDiffusion · Focus