Cross-Lingual Conversational Speech Summarization with Large Language   Models

Max Nelson; Shannon Wotherspoon; Francis Keith; William Hartmann,; Matthew Snover

arXiv:2408.06484·cs.CL·August 14, 2024

Cross-Lingual Conversational Speech Summarization with Large Language Models

Max Nelson, Shannon Wotherspoon, Francis Keith, William Hartmann,, Matthew Snover

PDF

Open Access

TL;DR

This paper addresses cross-lingual conversational speech summarization by augmenting existing datasets with GPT-4 generated summaries, analyzing the impact of transcription and translation errors, and adapting large language models for improved performance.

Contribution

It introduces a new dataset with summaries for Spanish-English speech translation, and demonstrates effective adaptation of LLMs like Mistral-7B for this task.

Findings

01

Mistral-7B outperforms off-the-shelf models

02

GPT-4 generated summaries serve as effective ground truth

03

Transcription and translation errors significantly impact summarization quality

Abstract

Cross-lingual conversational speech summarization is an important problem, but suffers from a dearth of resources. While transcriptions exist for a number of languages, translated conversational speech is rare and datasets containing summaries are non-existent. We build upon the existing Fisher and Callhome Spanish-English Speech Translation corpus by supplementing the translations with summaries. The summaries are generated using GPT-4 from the reference translations and are treated as ground truth. The task is to generate similar summaries in the presence of transcription and translation errors. We build a baseline cascade-based system using open-source speech recognition and machine translation models. We test a range of LLMs for summarization and analyze the impact of transcription and translation errors. Adapting the Mistral-7B model for this task performs significantly better than…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis

MethodsAttention Is All You Need · Linear Layer · Residual Connection · Layer Normalization · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Absolute Position Encodings · Softmax