A Grounded Memory System For Smart Personal Assistants
Felix Ocker, J\"org Deigm\"oller, Pavel Smirnov, Julian Eggert

TL;DR
This paper introduces a comprehensive grounded memory system for smart personal assistants that integrates vision-language models, knowledge graphs, and retrieval-augmented generation to enhance perception, memory, and question answering capabilities.
Contribution
It presents a novel multi-component memory system combining perception, knowledge representation, and retrieval for more realistic and effective AI assistants.
Findings
Demonstrates integration of vision-language models with knowledge graphs.
Shows improved question answering through retrieval-augmented generation.
Provides a real-world example of system effectiveness.
Abstract
A wide variety of agentic AI applications - ranging from cognitive assistants for dementia patients to robotics - demand a robust memory system grounded in reality. In this paper, we propose such a memory system consisting of three components. First, we combine Vision Language Models for image captioning and entity disambiguation with Large Language Models for consistent information extraction during perception. Second, the extracted information is represented in a memory consisting of a knowledge graph enhanced by vector embeddings to efficiently manage relational information. Third, we combine semantic search and graph query generation for question answering via Retrieval Augmented Generation. We illustrate the system's working and potential using a real-world example.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Speech and dialogue systems
