InfiniPot: Infinite Context Processing on Memory-Constrained LLMs
Minsoo Kim, Kyuhong Shim, Jungwook Choi, Simyung Chang

TL;DR
InfiniPot introduces a memory-efficient framework for large language models to handle extensive input sequences in resource-limited environments by compressing and retaining essential information without additional training.
Contribution
The paper presents InfiniPot, a novel KV cache control framework utilizing Continual Context Distillation to enable LLMs to process long contexts within fixed memory limits without retraining.
Findings
Outperforms models trained specifically for long contexts
Effectively maintains critical information with novel importance metrics
Enhances LLM applicability in resource-constrained scenarios
Abstract
Handling long input contexts remains a significant challenge for Large Language Models (LLMs), particularly in resource-constrained environments such as mobile devices. Our work aims to address this limitation by introducing InfiniPot, a novel KV cache control framework designed to enable pre-trained LLMs to manage extensive sequences within fixed memory constraints efficiently, without requiring additional training. InfiniPot leverages Continual Context Distillation (CCD), an iterative process that compresses and retains essential information through novel importance metrics, effectively maintaining critical data even without access to future context. Our comprehensive evaluations indicate that InfiniPot significantly outperforms models trained for long contexts in various NLP tasks, establishing its efficacy and versatility. This work represents a substantial advancement toward making…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Topic Modeling
