A Deep Memory-based Architecture for Sequence-to-Sequence Learning
Fandong Meng, Zhengdong Lu, Zhaopeng Tu, Hang Li, and Qun Liu

TL;DR
DEEPMEMORY is a deep neural architecture for sequence-to-sequence tasks like translation, utilizing stacked memory layers and learned nonlinear transformations to improve over existing models and achieve competitive performance.
Contribution
It introduces a deep memory-based architecture inspired by Neural Turing Machines, enabling more complex sequence modeling for translation tasks.
Findings
Outperforms previous neural translation models with a deeper architecture.
Achieves performance comparable to traditional phrase-based systems.
Easily scalable to large datasets.
Abstract
We propose DEEPMEMORY, a novel deep architecture for sequence-to-sequence learning, which performs the task through a series of nonlinear transformations from the representation of the input sequence (e.g., a Chinese sentence) to the final output sequence (e.g., translation to English). Inspired by the recently proposed Neural Turing Machine (Graves et al., 2014), we store the intermediate representations in stacked layers of memories, and use read-write operations on the memories to realize the nonlinear transformations between the representations. The types of transformations are designed in advance but the parameters are learned from data. Through layer-by-layer transformations, DEEPMEMORY can model complicated relations between sequences necessary for applications such as machine translation between distant languages. The architecture can be trained with normal back-propagation on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
MethodsSigmoid Activation · Tanh Activation · Neural Turing Machine · Content-based Attention · Long Short-Term Memory
