Implicit Memory Transformer for Computationally Efficient Simultaneous   Speech Translation

Matthew Raffel; Lizhong Chen

arXiv:2307.01381·cs.CL·July 6, 2023

Implicit Memory Transformer for Computationally Efficient Simultaneous Speech Translation

Matthew Raffel, Lizhong Chen

PDF

Open Access 1 Repo

TL;DR

This paper introduces an Implicit Memory Transformer for simultaneous speech translation that enhances efficiency by implicitly retaining context, achieving faster processing with comparable translation quality to existing methods.

Contribution

The paper proposes a novel implicit memory mechanism using left context derived from attention outputs, eliminating the need for explicit memory banks in streaming translation.

Findings

01

Significant speedup in encoder processing time.

02

Maintains translation quality comparable to state-of-the-art methods.

03

Reduces computational complexity by removing explicit memory representations.

Abstract

Simultaneous speech translation is an essential communication task difficult for humans whereby a translation is generated concurrently with oncoming speech inputs. For such a streaming task, transformers using block processing to break an input sequence into segments have achieved state-of-the-art performance at a reduced cost. Current methods to allow information to propagate across segments, including left context and memory banks, have faltered as they are both insufficient representations and unnecessarily expensive to compute. In this paper, we propose an Implicit Memory Transformer that implicitly retains memory through a new left context method, removing the need to explicitly represent memory with memory banks. We generate the left context from the attention output of the previous segment and include it in the keys and values of the current segment's attention calculation.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

osu-starlab/implicitmemory
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsMulti-Head Attention · Attention Is All You Need · Layer Normalization · Absolute Position Encodings · Byte Pair Encoding · Linear Layer · Label Smoothing · Adam · Position-Wise Feed-Forward Layer · Residual Connection