Depthwise Separable Convolutions for Neural Machine Translation

Lukasz Kaiser; Aidan N. Gomez; Francois Chollet

arXiv:1706.03059·cs.CL·June 19, 2017·247 cites

Depthwise Separable Convolutions for Neural Machine Translation

Lukasz Kaiser, Aidan N. Gomez, Francois Chollet

PDF

Open Access 2 Repos

TL;DR

This paper explores the application of depthwise separable convolutions in neural machine translation, introducing a new architecture that reduces parameters and computation while achieving state-of-the-art results.

Contribution

The paper introduces SliceNet, a novel architecture using depthwise separable convolutions for machine translation, enabling larger convolution windows and improved efficiency.

Findings

01

Depthwise separable convolutions perform well in machine translation.

02

SliceNet achieves state-of-the-art results with fewer parameters.

03

Super-separable convolutions further reduce computational costs.

Abstract

Depthwise separable convolutions reduce the number of parameters and computation used in convolutional operations while increasing representational efficiency. They have been shown to be successful in image classification models, both in obtaining better models than previously possible for a given parameter count (the Xception architecture) and considerably reducing the number of parameters required to perform at a given level (the MobileNets family of architectures). Recently, convolutional sequence-to-sequence networks have been applied to machine translation tasks with good results. In this work, we study how depthwise separable convolutions can be applied to neural machine translation. We introduce a new architecture inspired by Xception and ByteNet, called SliceNet, which enables a significant reduction of the parameter count and amount of computation needed to obtain results like…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Human Pose and Action Recognition

MethodsConvolution