TranSFormer: Slow-Fast Transformer for Machine Translation

Bei Li; Yi Jing; Xu Tan; Zhen Xing; Tong Xiao; Jingbo Zhu

arXiv:2305.16982·cs.CL·May 29, 2023·1 cites

TranSFormer: Slow-Fast Transformer for Machine Translation

Bei Li, Yi Jing, Xu Tan, Zhen Xing, Tong Xiao, Jingbo Zhu

PDF

Open Access

TL;DR

TranSFormer introduces a dual-branch Transformer model that integrates character-level features with subword sequences, improving translation quality efficiently across benchmarks.

Contribution

It proposes a novel Slow-Fast two-stream Transformer architecture that incorporates character-level features into machine translation models.

Findings

01

Achieves over 1 BLEU point improvement on multiple benchmarks.

02

Efficiently combines character and subword information in translation.

03

Demonstrates the effectiveness of multiscale features in Transformer models.

Abstract

Learning multiscale Transformer models has been evidenced as a viable approach to augmenting machine translation systems. Prior research has primarily focused on treating subwords as basic units in developing such systems. However, the incorporation of fine-grained character-level features into multiscale Transformer has not yet been explored. In this work, we present a \textbf{S}low-\textbf{F}ast two-stream learning model, referred to as Tran\textbf{SF}ormer, which utilizes a ``slow'' branch to deal with subword sequences and a ``fast'' branch to deal with longer character sequences. This model is efficient since the fast branch is very lightweight by reducing the model width, and yet provides useful fine-grained features for the slow branch. Our TranSFormer shows consistent BLEU improvements (larger than 1 BLEU point) on several machine translation benchmarks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Machine Learning in Bioinformatics

MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Layer Normalization · Byte Pair Encoding · Dropout · Linear Layer · Label Smoothing · Adam · Residual Connection