Semantic Caching of Contextual Summaries for Efficient Question-Answering with Language Models

Camille Couturier; Spyros Mastorakis; Haiying Shen; Saravan Rajmohan; Victor R\"uhle

arXiv:2505.11271·cs.CL·May 19, 2025

Semantic Caching of Contextual Summaries for Efficient Question-Answering with Language Models

Camille Couturier, Spyros Mastorakis, Haiying Shen, Saravan Rajmohan, Victor R\"uhle

PDF

Open Access

TL;DR

This paper presents a semantic caching method for LLM question-answering systems that significantly reduces redundant computation and maintains accuracy, improving efficiency in real-time applications.

Contribution

It introduces a novel semantic caching technique for reusing contextual summaries, reducing computational overhead in LLM-based QA workflows.

Findings

01

Reduces redundant computation by 50-60%.

02

Maintains answer accuracy comparable to full processing.

03

Effective on multiple datasets including NaturalQuestions and TriviaQA.

Abstract

Large Language Models (LLMs) are increasingly deployed across edge and cloud platforms for real-time question-answering and retrieval-augmented generation. However, processing lengthy contexts in distributed systems incurs high computational overhead, memory usage, and network bandwidth. This paper introduces a novel semantic caching approach for storing and reusing intermediate contextual summaries, enabling efficient information reuse across similar queries in LLM-based QA workflows. Our method reduces redundant computations by up to 50-60% while maintaining answer accuracy comparable to full document processing, as demonstrated on NaturalQuestions, TriviaQA, and a synthetic ArXiv dataset. This approach balances computational cost and response quality, critical for real-time AI assistants.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Expert finding and Q&A systems · Information Retrieval and Search Behavior