Cross-Lingual Conversational Speech Summarization with Large Language Models
Max Nelson, Shannon Wotherspoon, Francis Keith, William Hartmann,, Matthew Snover

TL;DR
This paper addresses cross-lingual conversational speech summarization by augmenting existing datasets with GPT-4 generated summaries, analyzing the impact of transcription and translation errors, and adapting large language models for improved performance.
Contribution
It introduces a new dataset with summaries for Spanish-English speech translation, and demonstrates effective adaptation of LLMs like Mistral-7B for this task.
Findings
Mistral-7B outperforms off-the-shelf models
GPT-4 generated summaries serve as effective ground truth
Transcription and translation errors significantly impact summarization quality
Abstract
Cross-lingual conversational speech summarization is an important problem, but suffers from a dearth of resources. While transcriptions exist for a number of languages, translated conversational speech is rare and datasets containing summaries are non-existent. We build upon the existing Fisher and Callhome Spanish-English Speech Translation corpus by supplementing the translations with summaries. The summaries are generated using GPT-4 from the reference translations and are treated as ground truth. The task is to generate similar summaries in the presence of transcription and translation errors. We build a baseline cascade-based system using open-source speech recognition and machine translation models. We test a range of LLMs for summarization and analyze the impact of transcription and translation errors. Adapting the Mistral-7B model for this task performs significantly better than…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
MethodsAttention Is All You Need · Linear Layer · Residual Connection · Layer Normalization · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Absolute Position Encodings · Softmax
