Multimodal Data Storage and Retrieval for Embodied AI: A Survey
Yihao Lu, Hao Tang

TL;DR
This survey reviews storage and retrieval architectures for multimodal data in Embodied AI, analyzing their suitability, challenges, and future research directions to support robust, real-time autonomous systems.
Contribution
It systematically evaluates five storage architectures and five retrieval paradigms, identifying key bottlenecks and proposing a research agenda for data management in Embodied AI.
Findings
Graph and multi-model databases are suitable for physical grounding.
A trade-off exists between semantic coherence and real-time responsiveness.
Key challenges include cross-modal integration and dynamic adaptation.
Abstract
Embodied AI (EAI) agents continuously interact with the physical world, generating vast, heterogeneous multimodal data streams that traditional management systems are ill-equipped to handle. In this survey, we first systematically evaluate five storage architectures (Graph Databases, Multi-Model Databases, Data Lakes, Vector Databases, and Time-Series Databases), focusing on their suitability for addressing EAI's core requirements, including physical grounding, low-latency access, and dynamic scalability. We then analyze five retrieval paradigms (Fusion Strategy-Based Retrieval, Representation Alignment-Based Retrieval, Graph-Structure-Based Retrieval, Generation Model-Based Retrieval, and Efficient Retrieval-Based Optimization), revealing a fundamental tension between achieving long-term semantic coherence and maintaining real-time responsiveness. Based on this comprehensive analysis,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
