KARMA: Augmenting Embodied AI Agents with Long-and-short Term Memory   Systems

Zixuan Wang; Bo Yu; Junzhe Zhao; Wenhao Sun; Sai Hou; Shuai Liang,; Xing Hu; Yinhe Han; Yiming Gan

arXiv:2409.14908·cs.RO·March 24, 2025·2 cites

KARMA: Augmenting Embodied AI Agents with Long-and-short Term Memory Systems

Zixuan Wang, Bo Yu, Junzhe Zhao, Wenhao Sun, Sai Hou, Shuai Liang,, Xing Hu, Yinhe Han, Yiming Gan

PDF

Open Access

TL;DR

KARMA introduces a dual-memory system combining long-term scene graphs and short-term dynamic memory to improve planning and efficiency in embodied AI agents performing household tasks, with demonstrated success in simulation and real-world deployment.

Contribution

The paper presents a novel memory system that integrates long-term and short-term memories to enhance embodied AI agents' task planning capabilities.

Findings

01

Improves success rates by 1.3x and 2.3x in composite and complex tasks.

02

Enhances task efficiency by 3.4x and 62.7x.

03

Demonstrates seamless deployment on real-world robotic platforms.

Abstract

Embodied AI agents responsible for executing interconnected, long-sequence household tasks often face difficulties with in-context memory, leading to inefficiencies and errors in task execution. To address this issue, we introduce KARMA, an innovative memory system that integrates long-term and short-term memory modules, enhancing large language models (LLMs) for planning in embodied agents through memory-augmented prompting. KARMA distinguishes between long-term and short-term memory, with long-term memory capturing comprehensive 3D scene graphs as representations of the environment, while short-term memory dynamically records changes in objects' positions and states. This dual-memory structure allows agents to retrieve relevant past scene experiences, thereby improving the accuracy and efficiency of task planning. Short-term memory employs strategies for effective and adaptive memory…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition