Segatron: Segment-Aware Transformer for Language Modeling and   Understanding

He Bai; Peng Shi; Jimmy Lin; Yuqing Xie; Luchen Tan; Kun Xiong; Wen; Gao; Ming Li

arXiv:2004.14996·cs.CL·December 17, 2020·1 cites

Segatron: Segment-Aware Transformer for Language Modeling and Understanding

He Bai, Peng Shi, Jimmy Lin, Yuqing Xie, Luchen Tan, Kun Xiong, Wen, Gao, Ming Li

PDF

Open Access 1 Repo 1 Video

TL;DR

Segatron introduces a segment-aware positional encoding mechanism to Transformer models, enhancing their contextual understanding and performance across language modeling and NLP tasks.

Contribution

The paper proposes a novel segment-aware encoding method for Transformers, improving language model perplexity and NLP task performance over standard models.

Findings

01

Achieves 17.1 perplexity on WikiText-103 with Transformer-XL.

02

SegaBERT outperforms vanilla BERT on multiple NLP tasks.

03

Outperforms RoBERTa in zero-shot sentence representation learning.

Abstract

Transformers are powerful for sequence modeling. Nearly all state-of-the-art language models and pre-trained language models are based on the Transformer architecture. However, it distinguishes sequential tokens only with the token position index. We hypothesize that better contextual representations can be generated from the Transformer with richer positional information. To verify this, we propose a segment-aware Transformer (Segatron), by replacing the original token position encoding with a combined position encoding of paragraph, sentence, and token. We first introduce the segment-aware mechanism to Transformer-XL, which is a popular Transformer-based language model with memory extension and relative position encoding. We find that our method can further improve the Transformer-XL base model and large model, achieving 17.1 perplexity on the WikiText-103 dataset. We further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rsvp-ai/segatron_aaai
pytorchOfficial

Videos

Segatron: Segment-Aware Transformer for Language Modeling and Understanding· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Cosine Annealing · Adaptive Input Representations · Linear Warmup With Cosine Annealing · Adaptive Softmax · Variational Dropout · Transformer-XL · RoBERTa