Text Compression-aided Transformer Encoding
Zuchao Li, Zhuosheng Zhang, Hai Zhao, Rui Wang, Kehai Chen, Masao, Utiyama, and Eiichiro Sumita

TL;DR
This paper introduces explicit and implicit text compression methods to enhance Transformer encodings in NLP, improving downstream task performance by focusing on the core meaning of input text.
Contribution
It proposes novel text compression techniques integrated into Transformer models, which enhance language representations and boost performance on various NLP tasks.
Findings
Compression approaches improve model accuracy
Enhanced focus on core text meaning
Better representations lead to improved downstream results
Abstract
Text encoding is one of the most important steps in Natural Language Processing (NLP). It has been done well by the self-attention mechanism in the current state-of-the-art Transformer encoder, which has brought about significant improvements in the performance of many NLP tasks. Though the Transformer encoder may effectively capture general information in its resulting representations, the backbone information, meaning the gist of the input text, is not specifically focused on. In this paper, we propose explicit and implicit text compression approaches to enhance the Transformer encoding and evaluate models using this approach on several typical downstream tasks that rely on the encoding heavily. Our explicit text compression approaches use dedicated models to compress text, while our implicit text compression approach simply adds an additional module to the main model to handle text…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Attention Is All You Need · Softmax · Label Smoothing · Byte Pair Encoding · Layer Normalization · Dense Connections · Multi-Head Attention
