LoCoCo: Dropping In Convolutions for Long Context Compression

Ruisi Cai; Yuandong Tian; Zhangyang Wang; Beidi Chen

arXiv:2406.05317·cs.LG·October 29, 2024

LoCoCo: Dropping In Convolutions for Long Context Compression

Ruisi Cai, Yuandong Tian, Zhangyang Wang, Beidi Chen

PDF

Open Access 1 Repo

TL;DR

LoCoCo introduces a data-driven, convolution-based method for compressing long context sequences in LLMs, enabling efficient inference and tuning with minimal accuracy loss.

Contribution

It presents a novel adaptive fusion technique using convolutional kernels to dynamically blend KV pairs, allowing fixed-size caches to effectively represent long contexts.

Findings

01

Compressed up to 3482 tokens into a 128-size KV cache with minimal performance loss

02

Extended context length from 4K to 32K with a 512-size KV cache during tuning

03

Achieved accuracy improvements of up to 0.2791 over baselines at the same cache size

Abstract

This paper tackles the memory hurdle of processing long context sequences in Large Language Models (LLMs), by presenting a novel approach, Dropping In Convolutions for Long Context Compression (LoCoCo). LoCoCo employs only a fixed-size Key-Value (KV) cache, and can enhance efficiency in both inference and fine-tuning stages. Diverging from prior methods that selectively drop KV pairs based on heuristics, LoCoCo leverages a data-driven adaptive fusion technique, blending previous KV pairs with incoming tokens to minimize the loss of contextual information and ensure accurate attention modeling. This token integration is achieved through injecting one-dimensional convolutional kernels that dynamically calculate mixing weights for each KV cache slot. Designed for broad compatibility with existing LLM frameworks, LoCoCo allows for straightforward "drop-in" integration without needing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

VITA-Group/LoCoCo
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Compression Techniques