A Grounded Memory System For Smart Personal Assistants

Felix Ocker; J\"org Deigm\"oller; Pavel Smirnov; Julian Eggert

arXiv:2505.06328·cs.AI·May 13, 2025

A Grounded Memory System For Smart Personal Assistants

Felix Ocker, J\"org Deigm\"oller, Pavel Smirnov, Julian Eggert

PDF

Open Access

TL;DR

This paper introduces a comprehensive grounded memory system for smart personal assistants that integrates vision-language models, knowledge graphs, and retrieval-augmented generation to enhance perception, memory, and question answering capabilities.

Contribution

It presents a novel multi-component memory system combining perception, knowledge representation, and retrieval for more realistic and effective AI assistants.

Findings

01

Demonstrates integration of vision-language models with knowledge graphs.

02

Shows improved question answering through retrieval-augmented generation.

03

Provides a real-world example of system effectiveness.

Abstract

A wide variety of agentic AI applications - ranging from cognitive assistants for dementia patients to robotics - demand a robust memory system grounded in reality. In this paper, we propose such a memory system consisting of three components. First, we combine Vision Language Models for image captioning and entity disambiguation with Large Language Models for consistent information extraction during perception. Second, the extracted information is represented in a memory consisting of a knowledge graph enhanced by vector embeddings to efficiently manage relational information. Third, we combine semantic search and graph query generation for question answering via Retrieval Augmented Generation. We illustrate the system's working and potential using a real-world example.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Speech and dialogue systems