Implicit Memory Transformer for Computationally Efficient Simultaneous Speech Translation
Matthew Raffel, Lizhong Chen

TL;DR
This paper introduces an Implicit Memory Transformer for simultaneous speech translation that enhances efficiency by implicitly retaining context, achieving faster processing with comparable translation quality to existing methods.
Contribution
The paper proposes a novel implicit memory mechanism using left context derived from attention outputs, eliminating the need for explicit memory banks in streaming translation.
Findings
Significant speedup in encoder processing time.
Maintains translation quality comparable to state-of-the-art methods.
Reduces computational complexity by removing explicit memory representations.
Abstract
Simultaneous speech translation is an essential communication task difficult for humans whereby a translation is generated concurrently with oncoming speech inputs. For such a streaming task, transformers using block processing to break an input sequence into segments have achieved state-of-the-art performance at a reduced cost. Current methods to allow information to propagate across segments, including left context and memory banks, have faltered as they are both insufficient representations and unnecessarily expensive to compute. In this paper, we propose an Implicit Memory Transformer that implicitly retains memory through a new left context method, removing the need to explicitly represent memory with memory banks. We generate the left context from the attention output of the previous segment and include it in the keys and values of the current segment's attention calculation.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsMulti-Head Attention · Attention Is All You Need · Layer Normalization · Absolute Position Encodings · Byte Pair Encoding · Linear Layer · Label Smoothing · Adam · Position-Wise Feed-Forward Layer · Residual Connection
