Calliope: A TTS-based Narrated E-book Creator Ensuring Exact Synchronization, Privacy, and Layout Fidelity
Hugo L. Hammer, Vajira Thambawita, P{\aa}l Halvorsen

TL;DR
Calliope is an open-source framework that converts text e-books into synchronized narrated EPUB 3 media overlays using open-source TTS, ensuring privacy, exact synchronization, and preservation of original formatting.
Contribution
It introduces a novel offline pipeline that captures audio timestamps during TTS to achieve precise synchronization and maintains original text styling, filling a gap in open-source narrated e-book tools.
Findings
Exact synchronization is achieved without drift.
The pipeline preserves original typography and media.
Offline operation reduces costs and privacy concerns.
Abstract
A narrated e-book combines synchronized audio with digital text, highlighting the currently spoken word or sentence during playback. This format supports early literacy and assists individuals with reading challenges, while also allowing general readers to seamlessly switch between reading and listening. With the emergence of natural-sounding neural Text-to-Speech (TTS) technology, several commercial services have been developed to leverage these technology for converting standard text e-books into high-quality narrated e-books. However, no open-source solutions currently exist to perform this task. In this paper, we present Calliope, an open-source framework designed to fill this gap. Our method leverages state-of-the-art open-source TTS to convert a text e-book into a narrated e-book in the EPUB 3 Media Overlay format. The method offers several innovative steps: audio timestamps are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsChild Development and Digital Technology · Interactive and Immersive Displays · Multimedia Communication and Technology
