InfiniPot: Infinite Context Processing on Memory-Constrained LLMs

Minsoo Kim; Kyuhong Shim; Jungwook Choi; Simyung Chang

arXiv:2410.01518·cs.CL·October 4, 2024

InfiniPot: Infinite Context Processing on Memory-Constrained LLMs

Minsoo Kim, Kyuhong Shim, Jungwook Choi, Simyung Chang

PDF

Open Access 1 Video

TL;DR

InfiniPot introduces a memory-efficient framework for large language models to handle extensive input sequences in resource-limited environments by compressing and retaining essential information without additional training.

Contribution

The paper presents InfiniPot, a novel KV cache control framework utilizing Continual Context Distillation to enable LLMs to process long contexts within fixed memory limits without retraining.

Findings

01

Outperforms models trained specifically for long contexts

02

Effectively maintains critical information with novel importance metrics

03

Enhances LLM applicability in resource-constrained scenarios

Abstract

Handling long input contexts remains a significant challenge for Large Language Models (LLMs), particularly in resource-constrained environments such as mobile devices. Our work aims to address this limitation by introducing InfiniPot, a novel KV cache control framework designed to enable pre-trained LLMs to manage extensive sequences within fixed memory constraints efficiently, without requiring additional training. InfiniPot leverages Continual Context Distillation (CCD), an iterative process that compresses and retains essential information through novel importance metrics, effectively maintaining critical data even without access to future context. Our comprehensive evaluations indicate that InfiniPot significantly outperforms models trained for long contexts in various NLP tasks, establishing its efficacy and versatility. This work represents a substantial advancement toward making…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

InfiniPot: Infinite Context Processing on Memory-Constrained LLMs· underline

Taxonomy

TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Topic Modeling