Talking Slide Avatars: Open-Source Multimodal Communication Approach for Teaching

Xinxing Wu

arXiv:2604.23703·cs.HC·April 28, 2026

Talking Slide Avatars: Open-Source Multimodal Communication Approach for Teaching

Xinxing Wu

PDF

TL;DR

This paper introduces an open-source workflow for creating talking slide avatars that enhance online teaching by combining multimodal communication, aesthetic design, and ethical considerations.

Contribution

It presents a novel educator-oriented open-source model for producing talking slide avatars, framing them as educational communication artifacts.

Findings

01

Short, transparent avatars humanize slide instruction.

02

The workflow enables reusable, engaging multimedia content.

03

Guidelines improve ethical and effective avatar use.

Abstract

Slide-based teaching is widely used in higher education, yet in online, hybrid, and asynchronous contexts, slides often lose the instructor presence, narrative continuity, and expressive framing that help learners connect with content. Full lecture video can partly restore these qualities, but it is time-consuming to record, revise, and reuse. This study addresses that pedagogical and production challenge by presenting a practice-based analysis of an open-source workflow for creating talking slide avatars for slide-based teaching. The workflow integrates OpenVoice for text-to-speech generation and voice cloning with Ditto-TalkingHead for audio-driven talking-image synthesis, enabling instructors to transform a script and a static portrait into a short narrated video that can be embedded in slide decks or HTML-based lecture materials. Rather than treating this workflow merely as a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.