Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le

TL;DR
Funnel-Transformer introduces a sequence compression approach that reduces computational redundancy in language models, enabling more efficient processing while maintaining or improving performance across various NLP tasks.
Contribution
The paper proposes a novel Funnel-Transformer architecture that compresses token sequences to reduce computation and enhances model capacity by reallocating FLOPs, with effective token-level prediction recovery.
Findings
Outperforms standard Transformer with fewer or comparable FLOPs
Achieves superior results on text classification, understanding, and reading comprehension
Demonstrates improved efficiency and scalability in language processing tasks
Abstract
With the success of language pretraining, it is highly desirable to develop more efficient architectures of good scalability that can exploit the abundant unlabeled data at a lower cost. To improve the efficiency, we examine the much-overlooked redundancy in maintaining a full-length token-level presentation, especially for tasks that only require a single-vector presentation of the sequence. With this intuition, we propose Funnel-Transformer which gradually compresses the sequence of hidden states to a shorter one and hence reduces the computation cost. More importantly, by re-investing the saved FLOPs from length reduction in constructing a deeper or wider model, we further improve the model capacity. In addition, to perform token-level predictions as required by common pretraining objectives, Funnel-Transformer is able to recover a deep representation for each token from the reduced…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗funnel-transformer/intermediate-basemodel· 7 dl7 dl
- 🤗funnel-transformer/intermediatemodel· 2 dl2 dl
- 🤗funnel-transformer/large-basemodel· 5 dl· ♡ 15 dl♡ 1
- 🤗funnel-transformer/largemodel· 15 dl· ♡ 215 dl♡ 2
- 🤗funnel-transformer/medium-basemodel· 42 dl42 dl
- 🤗funnel-transformer/mediummodel· 27 dl27 dl
- 🤗funnel-transformer/small-basemodel· 49 dl49 dl
- 🤗funnel-transformer/smallmodel· 133k dl· ♡ 6133k dl♡ 6
- 🤗funnel-transformer/xlarge-basemodel· 11 dl11 dl
- 🤗funnel-transformer/xlargemodel· 5 dl· ♡ 15 dl♡ 1
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Funnel Transformer · Residual Connection · Label Smoothing · Multi-Head Attention · Adam · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout
