X-OmniClaw Technical Report: A Unified Mobile Agent for Multimodal Understanding and Interaction

Xiaoming Ren; Ru Zhen; Chao Li; Yang Song; Qiuxia Hou; Yanhao Zhang; Peng Liu; Qi Qi; Quanlong Zheng; Qi Wu; Zhenyi Liao; Binqiang Pan; Haobo Ji; Haonan Lu

arXiv:2605.05765·cs.CV·May 22, 2026

X-OmniClaw Technical Report: A Unified Mobile Agent for Multimodal Understanding and Interaction

Xiaoming Ren, Ru Zhen, Chao Li, Yang Song, Qiuxia Hou, Yanhao Zhang, Peng Liu, Qi Qi, Quanlong Zheng, Qi Wu, Zhenyi Liao, Binqiang Pan, Haobo Ji, Haonan Lu

PDF

1 Repo

TL;DR

X-OmniClaw is a unified mobile agent architecture for multimodal understanding and interaction on Android, integrating perception, memory, and action for personalized, context-aware tasks.

Contribution

The paper introduces X-OmniClaw, a novel unified architecture combining perception, memory, and action modules for mobile agents, enabling complex multimodal interactions.

Findings

01

Enhances interaction efficiency and task reliability in diverse scenarios.

02

Provides a practical blueprint for next-generation mobile personal assistants.

Abstract

Inspired by the development of OpenClaw, there is a growing demand for mobile-based personal agents capable of handling complex and intuitive interactions. In this technical report, we introduce X-OmniClaw, a unified mobile agent designed for multimodal understanding and interaction in the Android ecosystem. This unified architecture of perception, memory, and action enables the agent to handle complex mobile tasks with high contextual awareness. Specifically, Omni Perception provides a unified multimodal ingress pipeline that integrates UI states, real-world visual contexts, and speech inputs, leveraging a temporal alignment module to decompose raw data into structured multimodal intent representations. Omni Memory leverages multimodal memory optimization to enhance personalized intelligence by integrating runtime working memory for task continuity with long-term personal memory…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

oppo-mente-lab/X-OmniClaw
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.