Funnel-Transformer: Filtering out Sequential Redundancy for Efficient   Language Processing

Zihang Dai; Guokun Lai; Yiming Yang; Quoc V. Le

arXiv:2006.03236·cs.LG·June 8, 2020·104 cites

Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing

Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le

PDF

Open Access 3 Repos 10 Models

TL;DR

Funnel-Transformer introduces a sequence compression approach that reduces computational redundancy in language models, enabling more efficient processing while maintaining or improving performance across various NLP tasks.

Contribution

The paper proposes a novel Funnel-Transformer architecture that compresses token sequences to reduce computation and enhances model capacity by reallocating FLOPs, with effective token-level prediction recovery.

Findings

01

Outperforms standard Transformer with fewer or comparable FLOPs

02

Achieves superior results on text classification, understanding, and reading comprehension

03

Demonstrates improved efficiency and scalability in language processing tasks

Abstract

With the success of language pretraining, it is highly desirable to develop more efficient architectures of good scalability that can exploit the abundant unlabeled data at a lower cost. To improve the efficiency, we examine the much-overlooked redundancy in maintaining a full-length token-level presentation, especially for tasks that only require a single-vector presentation of the sequence. With this intuition, we propose Funnel-Transformer which gradually compresses the sequence of hidden states to a shorter one and hence reduces the computation cost. More importantly, by re-investing the saved FLOPs from length reduction in constructing a deeper or wider model, we further improve the model capacity. In addition, to perform token-level predictions as required by common pretraining objectives, Funnel-Transformer is able to recover a deep representation for each token from the reduced…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Funnel Transformer · Residual Connection · Label Smoothing · Multi-Head Attention · Adam · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout