Enhanced Transformer Architecture for Natural Language Processing

Woohyeon Moon; Taeyoung Kim; Bumgeun Park; Dongsoo Har

arXiv:2310.10930·cs.CL·October 18, 2023·1 cites

Enhanced Transformer Architecture for Natural Language Processing

Woohyeon Moon, Taeyoung Kim, Bumgeun Park, Dongsoo Har

PDF

Open Access

TL;DR

This paper introduces an Enhanced Transformer architecture for NLP that improves performance using novel structural modifications, achieving significantly higher BLEU scores without increasing model size.

Contribution

A new Transformer structure with full layer normalization, weighted residuals, reinforcement learning-based positional encoding, and zero masked self-attention is proposed.

Findings

01

Achieves 202.96% higher BLEU score than original Transformer.

02

Validated on Multi30k translation dataset.

03

Demonstrates improved efficiency and performance.

Abstract

Transformer is a state-of-the-art model in the field of natural language processing (NLP). Current NLP models primarily increase the number of transformers to improve processing performance. However, this technique requires a lot of training resources such as computing capacity. In this paper, a novel structure of Transformer is proposed. It is featured by full layer normalization, weighted residual connection, positional encoding exploiting reinforcement learning, and zero masked self-attention. The proposed Transformer model, which is called Enhanced Transformer, is validated by the bilingual evaluation understudy (BLEU) score obtained with the Multi30k translation dataset. As a result, the Enhanced Transformer achieves 202.96% higher BLEU score as compared to the original transformer with the translation dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Dense Connections · Linear Layer · Softmax · Residual Connection · Absolute Position Encodings · Layer Normalization · Adam · Byte Pair Encoding