Caching Historical Embeddings in Conversational Search
Ophir Frieder, Ida Mele, Cristina Ioana Muntean, Franco Maria Nardini,, Raffaele Perego, Nicola Tonellotto

TL;DR
This paper introduces a client-side embedding cache for conversational search that leverages temporal locality in queries to significantly improve response times and reduce backend load, achieving up to 75% cache hit rate.
Contribution
It proposes a novel embedding caching method with an efficient metric index, enhancing conversational search responsiveness without sacrificing answer quality.
Findings
Achieved up to 75% cache hit rate in experiments.
Significantly improved system responsiveness.
Reduced backend query load.
Abstract
Rapid response, namely low latency, is fundamental in search applications; it is particularly so in interactive search sessions, such as those encountered in conversational settings. An observation with a potential to reduce latency asserts that conversational queries exhibit a temporal locality in the lists of documents retrieved. Motivated by this observation, we propose and evaluate a client-side document embedding cache, improving the responsiveness of conversational search systems. By leveraging state-of-the-art dense retrieval models to abstract document and query semantics, we cache the embeddings of documents retrieved for a topic introduced in the conversation, as they are likely relevant to successive queries. Our document embedding cache implements an efficient metric index, answering nearest-neighbor similarity queries by estimating the approximate result sets returned. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCaching and Content Delivery · Advanced Image and Video Retrieval Techniques · Recommender Systems and Techniques
