Text Compression-aided Transformer Encoding

Zuchao Li; Zhuosheng Zhang; Hai Zhao; Rui Wang; Kehai Chen; Masao; Utiyama; and Eiichiro Sumita

arXiv:2102.05951·cs.CL·February 12, 2021

Text Compression-aided Transformer Encoding

Zuchao Li, Zhuosheng Zhang, Hai Zhao, Rui Wang, Kehai Chen, Masao, Utiyama, and Eiichiro Sumita

PDF

TL;DR

This paper introduces explicit and implicit text compression methods to enhance Transformer encodings in NLP, improving downstream task performance by focusing on the core meaning of input text.

Contribution

It proposes novel text compression techniques integrated into Transformer models, which enhance language representations and boost performance on various NLP tasks.

Findings

01

Compression approaches improve model accuracy

02

Enhanced focus on core text meaning

03

Better representations lead to improved downstream results

Abstract

Text encoding is one of the most important steps in Natural Language Processing (NLP). It has been done well by the self-attention mechanism in the current state-of-the-art Transformer encoder, which has brought about significant improvements in the performance of many NLP tasks. Though the Transformer encoder may effectively capture general information in its resulting representations, the backbone information, meaning the gist of the input text, is not specifically focused on. In this paper, we propose explicit and implicit text compression approaches to enhance the Transformer encoding and evaluate models using this approach on several typical downstream tasks that rely on the encoding heavily. Our explicit text compression approaches use dedicated models to compress text, while our implicit text compression approach simply adds an additional module to the main model to handle text…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Attention Is All You Need · Softmax · Label Smoothing · Byte Pair Encoding · Layer Normalization · Dense Connections · Multi-Head Attention