Beyond Transcripts: Iterative Peer-Editing with Audio Unlocks High-Quality Human Summaries of Conversational Speech
Kaavya Chaparala, Thomas Thebaud, Jes\'us Villalba L\'opez, Laureano Moro-Velazquez, Peter Viechnicki, and Najim Dehak

TL;DR
This paper investigates how iterative peer-editing of audio recordings can produce high-quality speech summaries, comparable to transcript-based summaries and LLM outputs, especially when transcripts are unavailable.
Contribution
It demonstrates that peer-editing with audio enhances the informativeness of summaries, enabling effective benchmark creation without relying solely on transcripts.
Findings
Audio summaries are less informative than transcript summaries.
Peer-editing with audio improves summary quality to match transcript-based summaries.
Human summaries can be as informative as LLM outputs when using iterative peer-editing.
Abstract
There are not enough established benchmarks for the task fo speech summarization. Creating new benchmarks demands human annotation, as LLMs could embed systemic errors and bias into datasets. We test ten annotation workflows varying input modality (audio, transcript, or both) and the inclusion of editing (self or peer-editing) to investigate potential quality tradeoffs from using human annotators to summarize audio. We compare human audio-based summaries to human transcript-based summaries to track the impact of the different information modalities on summary quality. We also compare the human outputs against four LLM benchmarks (three text, one audio) to examine whether human-written summaries are less informative than highly fluent automated outputs. We find that audio-based summaries are less informative and more compressed than transcript summaries. However, iterative peer-editing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
