HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts
Xinyu Liu, Yingqing He, Lanqing Guo, Xiang Li, Bu Jin, Peng Li, Yan, Li, Chi-Min Chan, Qifeng Chen, Wei Xue, Wenhan Luo, Qifeng Liu, Yike Guo

TL;DR
HiPrompt introduces a hierarchical, tuning-free prompting method for high-resolution image generation that combines global and local guidance to improve structural quality and reduce artifacts in 4K and higher resolutions.
Contribution
The paper proposes HiPrompt, a novel hierarchical prompting framework that enhances high-resolution image generation without additional tuning, effectively addressing object repetition and structural artifacts.
Findings
Outperforms state-of-the-art methods in 4K image generation
Reduces object repetition and structural artifacts
Maintains coherent local and global semantics
Abstract
The potential for higher-resolution image generation using pretrained diffusion models is immense, yet these models often struggle with issues of object repetition and structural artifacts especially when scaling to 4K resolution and higher. We figure out that the problem is caused by that, a single prompt for the generation of multiple scales provides insufficient efficacy. In response, we propose HiPrompt, a new tuning-free solution that tackles the above problems by introducing hierarchical prompts. The hierarchical prompts offer both global and local guidance. Specifically, the global guidance comes from the user input that describes the overall content, while the local guidance utilizes patch-wise descriptions from MLLMs to elaborately guide the regional structure and texture generation. Furthermore, during the inverse denoising process, the generated noise is decomposed into low-…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Advanced Vision and Imaging
MethodsDiffusion · Focus
