Multimodal Data Storage and Retrieval for Embodied AI: A Survey

Yihao Lu; Hao Tang

arXiv:2508.13901·cs.RO·August 20, 2025

Multimodal Data Storage and Retrieval for Embodied AI: A Survey

Yihao Lu, Hao Tang

PDF

TL;DR

This survey reviews storage and retrieval architectures for multimodal data in Embodied AI, analyzing their suitability, challenges, and future research directions to support robust, real-time autonomous systems.

Contribution

It systematically evaluates five storage architectures and five retrieval paradigms, identifying key bottlenecks and proposing a research agenda for data management in Embodied AI.

Findings

01

Graph and multi-model databases are suitable for physical grounding.

02

A trade-off exists between semantic coherence and real-time responsiveness.

03

Key challenges include cross-modal integration and dynamic adaptation.

Abstract

Embodied AI (EAI) agents continuously interact with the physical world, generating vast, heterogeneous multimodal data streams that traditional management systems are ill-equipped to handle. In this survey, we first systematically evaluate five storage architectures (Graph Databases, Multi-Model Databases, Data Lakes, Vector Databases, and Time-Series Databases), focusing on their suitability for addressing EAI's core requirements, including physical grounding, low-latency access, and dynamic scalability. We then analyze five retrieval paradigms (Fusion Strategy-Based Retrieval, Representation Alignment-Based Retrieval, Graph-Structure-Based Retrieval, Generation Model-Based Retrieval, and Efficient Retrieval-Based Optimization), revealing a fundamental tension between achieving long-term semantic coherence and maintaining real-time responsiveness. Based on this comprehensive analysis,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.