Efficient Speech Translation with Dynamic Latent Perceivers

Ioannis Tsiamas; Gerard I. G\'allego; Jos\'e A. R. Fonollosa; Marta R.; Costa-juss\`a

arXiv:2210.16264·cs.CL·March 15, 2023·1 cites

Efficient Speech Translation with Dynamic Latent Perceivers

Ioannis Tsiamas, Gerard I. G\'allego, Jos\'e A. R. Fonollosa, Marta R., Costa-juss\`a

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel speech translation model using a Perceiver encoder with Dynamic Latent Access, reducing complexity and enabling flexible deployment without sacrificing translation quality.

Contribution

It proposes a Perceiver-based architecture with DLA training for efficient, scalable speech translation that matches Transformer performance across multiple language pairs.

Findings

01

Perceiver with DLA matches Transformer accuracy in MuST-C.

02

DLA enables flexible deployment across different computational budgets.

03

The approach reduces quadratic complexity of traditional Transformers.

Abstract

Transformers have been the dominant architecture for Speech Translation in recent years, achieving significant improvements in translation quality. Since speech signals are longer than their textual counterparts, and due to the quadratic complexity of the Transformer, a down-sampling step is essential for its adoption in Speech Translation. Instead, in this research, we propose to ease the complexity by using a Perceiver encoder to map the speech inputs to a fixed-length latent representation. Furthermore, we introduce a novel way of training Perceivers, with Dynamic Latent Access (DLA), unlocking larger latent spaces without any additional computational overhead. Speech-to-Text Perceivers with DLA can match the performance of Transformer baselines across three language pairs in MuST-C. Finally, a DLA-trained model is easily adaptable to DLA at inference, and can be flexibly deployed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mt-upc/s2t-perceiver
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Adam · Position-Wise Feed-Forward Layer · Dense Connections · Label Smoothing · Absolute Position Encodings · Layer Normalization