Focus on the Core: Efficient Attention via Pruned Token Compression for Document Classification
Jungmin Yun, Mihyeon Kim, Youngbin Kim

TL;DR
This paper introduces a novel method combining token pruning and fuzzy logic-based token combining to improve efficiency and accuracy in transformer-based document classification, significantly reducing computational costs while boosting performance.
Contribution
It presents a new integrated approach for token pruning and combining with fuzzy logic to enhance transformer efficiency and accuracy in document classification tasks.
Findings
Achieved +5% accuracy over baseline BERT.
Reduced memory cost to 0.61x of original.
Speeded up processing by 1.64x.
Abstract
Transformer-based models have achieved dominant performance in numerous NLP tasks. Despite their remarkable successes, pre-trained transformers such as BERT suffer from a computationally expensive self-attention mechanism that interacts with all tokens, including the ones unfavorable to classification performance. To overcome these challenges, we propose integrating two strategies: token pruning and token combining. Token pruning eliminates less important tokens in the attention mechanism's key and value as they pass through the layers. Additionally, we adopt fuzzy logic to handle uncertainty and alleviate potential mispruning risks arising from an imbalanced distribution of each token's importance. Token combining, on the other hand, condenses input sequences into smaller sizes in order to further compress the model. By integrating these two approaches, we not only improve the model's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · WordPiece · Linear Warmup With Linear Decay · Weight Decay · Attention Dropout · Linear Layer · Adam · Attention Is All You Need · Residual Connection · Multi-Head Attention
