CItruS: Chunked Instruction-aware State Eviction for Long Sequence   Modeling

Yu Bai; Xiyuan Zou; Heyan Huang; Sanxing Chen; Marc-Antoine Rondeau,; Yang Gao; Jackie Chi Kit Cheung

arXiv:2406.12018·cs.CL·October 10, 2024

CItruS: Chunked Instruction-aware State Eviction for Long Sequence Modeling

Yu Bai, Xiyuan Zou, Heyan Huang, Sanxing Chen, Marc-Antoine Rondeau,, Yang Gao, Jackie Chi Kit Cheung

PDF

Open Access 1 Repo 1 Video

TL;DR

CItruS is a novel method that improves long sequence modeling by intelligently evicting hidden states based on downstream task relevance, enhancing task performance without sacrificing language modeling perplexity.

Contribution

It introduces Chunked Instruction-aware State Eviction (CItruS), a training-free technique that incorporates attention preferences into state eviction for better downstream task performance.

Findings

01

Outperforms strong baselines on long sequence comprehension tasks.

02

Maintains language modeling perplexity while improving downstream task results.

03

Efficient chunked sequence processing enhances performance and speed.

Abstract

Long sequence modeling has gained broad interest as large language models (LLMs) continue to advance. Recent research has identified that a large portion of hidden states within the key-value caches of Transformer models can be discarded (also termed evicted) without affecting the perplexity performance in generating long sequences. However, we show that these methods, despite preserving perplexity performance, often drop information that is important for solving downstream tasks, a problem which we call information neglect. To address this issue, we introduce Chunked Instruction-aware State Eviction (CItruS), a novel modeling technique that integrates the attention preferences useful for a downstream task into the eviction process of hidden states. In addition, we design a method for chunked sequence processing to further improve efficiency. Our training-free method exhibits superior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ybai-nlp/CItruS
pytorchOfficial

Videos

CItruS: Chunked Instruction-aware State Eviction for Long Sequence Modeling· underline

Taxonomy

TopicsNeural Networks and Applications · Time Series Analysis and Forecasting · Parallel Computing and Optimization Techniques

MethodsResidual Connection · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Adam · Attention Is All You Need · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer