Focus on the Core: Efficient Attention via Pruned Token Compression for   Document Classification

Jungmin Yun; Mihyeon Kim; Youngbin Kim

arXiv:2406.01283·cs.CL·June 4, 2024

Focus on the Core: Efficient Attention via Pruned Token Compression for Document Classification

Jungmin Yun, Mihyeon Kim, Youngbin Kim

PDF

TL;DR

This paper introduces a novel method combining token pruning and fuzzy logic-based token combining to improve efficiency and accuracy in transformer-based document classification, significantly reducing computational costs while boosting performance.

Contribution

It presents a new integrated approach for token pruning and combining with fuzzy logic to enhance transformer efficiency and accuracy in document classification tasks.

Findings

01

Achieved +5% accuracy over baseline BERT.

02

Reduced memory cost to 0.61x of original.

03

Speeded up processing by 1.64x.

Abstract

Transformer-based models have achieved dominant performance in numerous NLP tasks. Despite their remarkable successes, pre-trained transformers such as BERT suffer from a computationally expensive self-attention mechanism that interacts with all tokens, including the ones unfavorable to classification performance. To overcome these challenges, we propose integrating two strategies: token pruning and token combining. Token pruning eliminates less important tokens in the attention mechanism's key and value as they pass through the layers. Additionally, we adopt fuzzy logic to handle uncertainty and alleviate potential mispruning risks arising from an imbalanced distribution of each token's importance. Token combining, on the other hand, condenses input sequences into smaller sizes in order to further compress the model. By integrating these two approaches, we not only improve the model's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · WordPiece · Linear Warmup With Linear Decay · Weight Decay · Attention Dropout · Linear Layer · Adam · Attention Is All You Need · Residual Connection · Multi-Head Attention