PRISM: Efficient Long-Range Reasoning With Short-Context LLMs
Dulhan Jayalath, James Bradley Wendt, Nicholas Monath, Sandeep Tata, Beliz Gunel

TL;DR
PRISM is a novel in-context reasoning method that enables long-range reasoning with short-context LLMs, significantly reducing token usage and costs while maintaining high performance across diverse tasks.
Contribution
It introduces PRISM, a schema-based, token-efficient in-context approach that outperforms existing methods on long-range tasks with much shorter contexts.
Findings
Outperforms baselines with 4x shorter contexts
Reduces costs by up to 54% using KV caches
Generalizes to new tasks with minimal schema generation
Abstract
Long-range tasks demand reasoning over long inputs. However, existing solutions are limited, e.g., long-context models require large compute budgets, parameter-efficient fine-tuning (PEFT) needs training data, and retrieval-augmented generation (RAG) entails complex task-specific designs. Though in-context approaches overcome many of these issues, methods with short-context LLMs are inefficient, trading context for processing more tokens. We introduce PRISM, a highly token-efficient in-context method based on structured schemas that outperforms baselines on diverse tasks with 4x shorter contexts. This approach produces concise outputs and efficiently leverages key-value (KV) caches to reduce costs by up to 54%. PRISM scales down to tiny contexts without increasing costs or sacrificing quality, and generalizes to new tasks with minimal effort by generating schemas from task descriptions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSemantic Web and Ontologies · Data Mining Algorithms and Applications · Advanced Database Systems and Queries
