An Implementation of Multimodal Fusion System for Intelligent Digital Human Generation
Yingjie Zhou, Yaodong Chen, Kaiyue Bi, Lian Xiong, Hui Liu

TL;DR
This paper presents an AI-driven multimodal digital human generation system that integrates text, speech, and image inputs to produce realistic digital humans efficiently, reducing manual effort and development time.
Contribution
It introduces a comprehensive system combining multimodal fusion, large language models, image transformation, and digital content synthesis for digital human creation.
Findings
System effectively generates digital humans from multimodal inputs
Enhances user experience with style transfer and super-resolution
Demonstrates practical implementation with open-source code
Abstract
With the rapid development of artificial intelligence (AI), digital humans have attracted more and more attention and are expected to achieve a wide range of applications in several industries. Then, most of the existing digital humans still rely on manual modeling by designers, which is a cumbersome process and has a long development cycle. Therefore, facing the rise of digital humans, there is an urgent need for a digital human generation system combined with AI to improve development efficiency. In this paper, an implementation scheme of an intelligent digital human generation system with multimodal fusion is proposed. Specifically, text, speech and image are taken as inputs, and interactive speech is synthesized using large language model (LLM), voiceprint extraction, and text-to-speech conversion techniques. Then the input image is age-transformed and a suitable image is selected…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDiverse Topics in Contemporary Research · Human Motion and Animation · Advanced Technologies in Various Fields
