Recent Advances in Direct Speech-to-text Translation

Chen Xu; Rong Ye; Qianqian Dong; Chengqi Zhao; Tom Ko; Mingxuan Wang,; Tong Xiao; Jingbo Zhu

arXiv:2306.11646·cs.CL·June 21, 2023·2 cites

Recent Advances in Direct Speech-to-text Translation

Chen Xu, Rong Ye, Qianqian Dong, Chengqi Zhao, Tom Ko, Mingxuan Wang,, Tong Xiao, Jingbo Zhu

PDF

Open Access 1 Datasets

TL;DR

This survey reviews recent progress in direct speech-to-text translation, focusing on modeling, data, and application challenges, and discusses future research directions in the field.

Contribution

It provides a comprehensive categorization and analysis of current techniques addressing key challenges in direct speech translation, highlighting recent advances and future prospects.

Findings

01

Encoder-decoder and multitask frameworks address modeling burden.

02

Data augmentation, pre-training, and multilingual modeling mitigate data scarcity.

03

Application issues like real-time processing and gender bias are actively studied.

Abstract

Recently, speech-to-text translation has attracted more and more attention and many studies have emerged rapidly. In this paper, we present a comprehensive survey on direct speech translation aiming to summarize the current state-of-the-art techniques. First, we categorize the existing research work into three directions based on the main challenges -- modeling burden, data scarcity, and application issues. To tackle the problem of modeling burden, two main structures have been proposed, encoder-decoder framework (Transformer and the variants) and multitask frameworks. For the challenge of data scarcity, recent work resorts to many sophisticated techniques, such as data augmentation, pre-training, knowledge distillation, and multilingual modeling. We analyze and summarize the application issues, which include real-time, segmentation, named entity, gender bias, and code-switching.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

BAAI/SurveyScope
dataset· 6 dl
6 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling