SOMA: Strategic Orchestration and Memory-Augmented System for Vision-Language-Action Model Robustness via In-Context Adaptation

Zhuoran Li; Zhiyang Li; Kaijun Zhou; Jinyu Gu

arXiv:2603.24060·cs.RO·March 30, 2026

SOMA: Strategic Orchestration and Memory-Augmented System for Vision-Language-Action Model Robustness via In-Context Adaptation

Zhuoran Li, Zhiyang Li, Kaijun Zhou, Jinyu Gu

PDF

1 Repo

TL;DR

SOMA enhances vision-language-action models' robustness by integrating memory, causal attribution, and in-context adaptation, significantly improving success rates in out-of-distribution robotic tasks without fine-tuning.

Contribution

It introduces a novel system combining memory-augmented retrieval, LLM orchestration, and intervention protocols to improve VLA model robustness without parameter fine-tuning.

Findings

01

Achieves an average success rate gain of 56.6% across tested models.

02

Improves long-horizon task chaining success by 89.1%.

03

Demonstrates effectiveness on LIBERO-PRO and LIBERO-SOMA benchmarks.

Abstract

Despite the promise of Vision-Language-Action (VLA) models as generalist robotic controllers, their robustness against perceptual noise and environmental variations in out-of-distribution (OOD) tasks remains fundamentally limited by the absence of long-term memory, causal failure attribution, and dynamic intervention capability. To address this, we propose SOMA, a Strategic Orchestration and Memory-Augmented System that upgrades frozen VLA policies for robust in-context adaptation without parameter fine-tuning. Specifically, SOMA operates through an online pipeline of contrastive Dual-Memory Retrieval-Augmented Generation (RAG), an Attribution-Driven Large-Language-Model (LLM) Orchestrator, and extensible Model Context Protocol (MCP) interventions, while an offline Memory Consolidation module continuously distills the execution traces into reliable priors. Experimental evaluations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

LZY-1021/SOMA
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.