AI4Reading: Chinese Audiobook Interpretation System Based on Multi-Agent Collaboration
Minjiang Huang, Jipeng Qiang, Yi Zhu, Chaowei Zhang, Xiangyu Zhao, Kui Yu

TL;DR
AI4Reading is a multi-agent system that uses large language models and speech synthesis to automate the creation of Chinese audiobook interpretations, aiming for accuracy and clarity.
Contribution
This paper introduces a novel multi-agent framework leveraging LLMs and speech synthesis to automate audiobook interpretation creation, improving efficiency and content quality.
Findings
Generated scripts are simpler and more accurate than expert interpretations.
The system achieves key objectives of content preservation, clarity, and logical structure.
Speech quality still needs improvement compared to human narration.
Abstract
Audiobook interpretations are attracting increasing attention, as they provide accessible and in-depth analyses of books that offer readers practical insights and intellectual inspiration. However, their manual creation process remains time-consuming and resource-intensive. To address this challenge, we propose AI4Reading, a multi-agent collaboration system leveraging large language models (LLMs) and speech synthesis technology to generate podcast, like audiobook interpretations. The system is designed to meet three key objectives: accurate content preservation, enhanced comprehensibility, and a logical narrative structure. To achieve these goals, we develop a framework composed of 11 specialized agents,including topic analysts, case analysts, editors, a narrator, and proofreaders that work in concert to explore themes, extract real world cases, refine content organization, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Text Analysis Techniques · Speech and dialogue systems
