An Augmented Transformer Architecture for Natural Language Generation Tasks
Hailiang Li, Adele Y.C. Wang, Yang Liu, Du Tang, Zhibin Lei, Wenye Li

TL;DR
This paper introduces an augmented Transformer architecture that enhances positional encoding and incorporates linguistic knowledge like POS tags, leading to improved performance in natural language generation tasks such as translation and summarization.
Contribution
It proposes a novel augmentation of the Transformer with improved positional encoding and linguistic features, achieving better results than standard models.
Findings
Enhanced positional encoding improves sequence representation.
Incorporating POS tags boosts translation and summarization quality.
The augmented Transformer consistently outperforms vanilla Transformer in experiments.
Abstract
The Transformer based neural networks have been showing significant advantages on most evaluations of various natural language processing and other sequence-to-sequence tasks due to its inherent architecture based superiorities. Although the main architecture of the Transformer has been continuously being explored, little attention was paid to the positional encoding module. In this paper, we enhance the sinusoidal positional encoding algorithm by maximizing the variances between encoded consecutive positions to obtain additional promotion. Furthermore, we propose an augmented Transformer architecture encoded with additional linguistic knowledge, such as the Part-of-Speech (POS) tagging, to boost the performance on some natural language generation tasks, e.g., the automatic translation and summarization tasks. Experiments show that the proposed architecture attains constantly superior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax
