CALMem : Application-Layer Dual Memory for Conversational AI
Rajendra Narayan Jena, Rajan Padmanabhan, Sankar Arumugam

TL;DR
CALMem introduces a dual memory architecture for LLM-based conversational AI, enabling virtually unlimited context retention without modifying the underlying model, through episodic and semantic memory layers with adaptive retrieval.
Contribution
It proposes CALMem, a novel application-layer dual memory system that enhances context retention and retrieval in conversational AI without requiring model changes.
Findings
Enables unbounded effective context in conversations.
Implements intra-session retrieval for compacted history.
Operates as a provider-agnostic, zero-overhead application layer.
Abstract
Large language models (LLMs) operate within fixed context windows that fundamentally limit conversational continuity. When context fills, compaction discards history irreversibly; when sessions end, all memory resets to zero. Existing solutions-larger context windows, retrieval-augmented generation for knowledge bases, and memory-augmented architectures such as MemGPT-either require model modification, impose provider lock-in, or do not address the compaction continuity problem. We present CALMem (Conversational Application-Layer Memory), an application-layer dual memory architecture that gives LLM-based conversational assistants virtually unbounded effective context without any modification to the underlying model. CALMem combines two complementary memory subsystems: an episodic memory layer built on sliding-window vector embeddings of conversation history, and a semantic memory layer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
