LLMs Meet Multimodal Generation and Editing: A Survey
Yingqing He, Zhaoyang Liu, Jingye Chen, Zeyue Tian, Hongyu Liu,, Xiaowei Chi, Runtao Liu, Ruibin Yuan, Yazhou Xing, Wenhai Wang, Jifeng Dai,, Yong Zhang, Wei Xue, Qifeng Liu, Yike Guo, Qifeng Chen

TL;DR
This survey reviews recent progress in multimodal generation and editing using large language models, covering technical methods, datasets, applications, and safety considerations across various media types.
Contribution
It provides a comprehensive overview of multimodal generation and editing techniques involving LLMs, categorizing methods, analyzing technical components, and discussing future directions.
Findings
Summarizes milestone works in multimodal generation and editing.
Categorizes methods into LLM-based and CLIP/T5-based approaches.
Discusses tool-augmented multimodal agents and safety advancements.
Abstract
With the recent advancement in large language models (LLMs), there is a growing interest in combining LLMs with multimodal learning. Previous surveys of multimodal large language models (MLLMs) mainly focus on multimodal understanding. This survey elaborates on multimodal generation and editing across various domains, comprising image, video, 3D, and audio. Specifically, we summarize the notable advancements with milestone works in these fields and categorize these studies into LLM-based and CLIP/T5-based methods. Then, we summarize the various roles of LLMs in multimodal generation and exhaustively investigate the critical technical components behind these methods and the multimodal datasets utilized in these studies. Additionally, we dig into tool-augmented multimodal agents that can leverage existing generative models for human-computer interaction. Lastly, we discuss the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Translation Studies and Practices · Semantic Web and Ontologies
MethodsFocus
