Talking Slide Avatars: Open-Source Multimodal Communication Approach for Teaching
Xinxing Wu

TL;DR
This paper introduces an open-source workflow for creating talking slide avatars that enhance online teaching by combining multimodal communication, aesthetic design, and ethical considerations.
Contribution
It presents a novel educator-oriented open-source model for producing talking slide avatars, framing them as educational communication artifacts.
Findings
Short, transparent avatars humanize slide instruction.
The workflow enables reusable, engaging multimedia content.
Guidelines improve ethical and effective avatar use.
Abstract
Slide-based teaching is widely used in higher education, yet in online, hybrid, and asynchronous contexts, slides often lose the instructor presence, narrative continuity, and expressive framing that help learners connect with content. Full lecture video can partly restore these qualities, but it is time-consuming to record, revise, and reuse. This study addresses that pedagogical and production challenge by presenting a practice-based analysis of an open-source workflow for creating talking slide avatars for slide-based teaching. The workflow integrates OpenVoice for text-to-speech generation and voice cloning with Ditto-TalkingHead for audio-driven talking-image synthesis, enabling instructors to transform a script and a static portrait into a short narrated video that can be embedded in slide decks or HTML-based lecture materials. Rather than treating this workflow merely as a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
