AnyUser: Translating Sketched User Intent into Domestic Robots

Songyuan Yang; Huibin Tan; Kailun Yang; Wenjing Yang; Shaowu Yang

arXiv:2604.04811·cs.RO·April 7, 2026

AnyUser: Translating Sketched User Intent into Domestic Robots

Songyuan Yang, Huibin Tan, Kailun Yang, Wenjing Yang, Shaowu Yang

PDF

TL;DR

AnyUser is a multimodal robotic instruction system that translates sketches and language into executable domestic robot actions, validated through extensive benchmarks, real-world tests, and user studies.

Contribution

It introduces a novel multimodal fusion and hierarchical policy approach enabling intuitive, map-free robot task execution from sketches and language.

Findings

01

High accuracy in interpreting sketch commands across scenes

02

Successful real-world deployment on two robot platforms

03

User study shows improved usability and high task completion rates

Abstract

We introduce AnyUser, a unified robotic instruction system for intuitive domestic task instruction via free-form sketches on camera images, optionally with language. AnyUser interprets multimodal inputs (sketch, vision, language) as spatial-semantic primitives to generate executable robot actions requiring no prior maps or models. Novel components include multimodal fusion for understanding and a hierarchical policy for robust action generation. Efficacy is shown via extensive evaluations: (1) Quantitative benchmarks on the large-scale dataset showing high accuracy in interpreting diverse sketch-based commands across various simulated domestic scenes. (2) Real-world validation on two distinct robotic platforms, a statically mounted 7-DoF assistive arm (KUKA LBR iiwa) and a dual-arm mobile manipulator (Realman RMC-AIDAL), performing representative tasks like targeted wiping and area…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.