AI-based System for Transforming text and sound to Educational Videos
M. E. ElAlami, S. M. Khater, M. El. R. Rehan

TL;DR
This paper presents a novel AI system that transforms text and speech into educational videos using GANs, combining speech recognition, image generation, and video synthesis for improved educational content creation.
Contribution
The paper introduces a new GAN-based framework for generating full educational videos from text or speech, integrating multiple AI models for enhanced quality and semantic accuracy.
Findings
Achieved a Fréchet Inception Distance score of 28.75%, indicating high visual quality.
Outperformed existing systems like TGAN, MoCoGAN, and TGANS-C.
Produced fully interactive educational videos from input text or speech.
Abstract
Technological developments have produced methods that can generate educational videos from input text or sound. Recently, the use of deep learning techniques for image and video generation has been widely explored, particularly in education. However, generating video content from conditional inputs such as text or speech remains a challenging area. In this paper, we introduce a novel method to the educational structure, Generative Adversarial Network (GAN), which develop frame-for-frame frameworks and are able to create full educational videos. The proposed system is structured into three main phases In the first phase, the input (either text or speech) is transcribed using speech recognition. In the second phase, key terms are extracted and relevant images are generated using advanced models such as CLIP and diffusion models to enhance visual quality and semantic alignment. In the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Video Analysis and Summarization
