Instruction-Guided Editing Controls for Images and Multimedia: A Survey in LLM era
Thanh Tam Nguyen, Zhao Ren, Trinh Pham, Thanh Trung Huynh and, Phi Le Nguyen, Hongzhi Yin, Quoc Viet Hung Nguyen

TL;DR
This survey reviews instruction-guided editing techniques for images and multimedia enabled by large language models and multimodal learning, highlighting their potential to democratize content creation and manipulation.
Contribution
It synthesizes over 100 publications to provide a comprehensive overview of LLM-empowered visual editing methods and discusses future challenges in the field.
Findings
Instruction-based editing enhances accessibility for non-experts.
Multimodal models enable precise, fine-grained visual modifications.
Applications span fashion, 3D scenes, and video synthesis.
Abstract
The rapid advancement of large language models (LLMs) and multimodal learning has transformed digital content creation and manipulation. Traditional visual editing tools require significant expertise, limiting accessibility. Recent strides in instruction-based editing have enabled intuitive interaction with visual content, using natural language as a bridge between user intent and complex editing operations. This survey provides an overview of these techniques, focusing on how LLMs and multimodal models empower users to achieve precise visual modifications without deep technical knowledge. By synthesizing over 100 publications, we explore methods from generative adversarial networks to diffusion models, examining multimodal integration for fine-grained content control. We discuss practical applications across domains such as fashion, 3D scene manipulation, and video synthesis,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Multimedia Communication and Technology · Video Coding and Compression Technologies
MethodsDiffusion
