AI-based System for Transforming text and sound to Educational Videos

M. E. ElAlami; S. M. Khater; M. El. R. Rehan

arXiv:2601.17022·cs.MM·January 27, 2026

AI-based System for Transforming text and sound to Educational Videos

M. E. ElAlami, S. M. Khater, M. El. R. Rehan

PDF

Open Access

TL;DR

This paper presents a novel AI system that transforms text and speech into educational videos using GANs, combining speech recognition, image generation, and video synthesis for improved educational content creation.

Contribution

The paper introduces a new GAN-based framework for generating full educational videos from text or speech, integrating multiple AI models for enhanced quality and semantic accuracy.

Findings

01

Achieved a Fréchet Inception Distance score of 28.75%, indicating high visual quality.

02

Outperformed existing systems like TGAN, MoCoGAN, and TGANS-C.

03

Produced fully interactive educational videos from input text or speech.

Abstract

Technological developments have produced methods that can generate educational videos from input text or sound. Recently, the use of deep learning techniques for image and video generation has been widely explored, particularly in education. However, generating video content from conditional inputs such as text or speech remains a challenging area. In this paper, we introduce a novel method to the educational structure, Generative Adversarial Network (GAN), which develop frame-for-frame frameworks and are able to create full educational videos. The proposed system is structured into three main phases In the first phase, the input (either text or speech) is transcribed using speech recognition. In the second phase, key terms are extracted and relevant images are generated using advanced models such as CLIP and diffusion models to enhance visual quality and semantic alignment. In the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Video Analysis and Summarization