Depthwise Separable Convolutions for Neural Machine Translation
Lukasz Kaiser, Aidan N. Gomez, Francois Chollet

TL;DR
This paper explores the application of depthwise separable convolutions in neural machine translation, introducing a new architecture that reduces parameters and computation while achieving state-of-the-art results.
Contribution
The paper introduces SliceNet, a novel architecture using depthwise separable convolutions for machine translation, enabling larger convolution windows and improved efficiency.
Findings
Depthwise separable convolutions perform well in machine translation.
SliceNet achieves state-of-the-art results with fewer parameters.
Super-separable convolutions further reduce computational costs.
Abstract
Depthwise separable convolutions reduce the number of parameters and computation used in convolutional operations while increasing representational efficiency. They have been shown to be successful in image classification models, both in obtaining better models than previously possible for a given parameter count (the Xception architecture) and considerably reducing the number of parameters required to perform at a given level (the MobileNets family of architectures). Recently, convolutional sequence-to-sequence networks have been applied to machine translation tasks with good results. In this work, we study how depthwise separable convolutions can be applied to neural machine translation. We introduce a new architecture inspired by Xception and ByteNet, called SliceNet, which enables a significant reduction of the parameter count and amount of computation needed to obtain results like…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Human Pose and Action Recognition
MethodsConvolution
