UNCOM: Zero-shot Context-Aware Command Understanding for Tabletop Scenarios

Antonio Galiza Cerdeira Gonzalez; Pawe{\l} Gajewski; Bipin Indurkhya

arXiv:2410.06355·cs.RO·May 11, 2026

UNCOM: Zero-shot Context-Aware Command Understanding for Tabletop Scenarios

Antonio Galiza Cerdeira Gonzalez, Pawe{\l} Gajewski, Bipin Indurkhya

PDF

1 Repo

TL;DR

UNCOM is a hybrid framework enabling zero-shot, context-aware interpretation of natural commands for robots in tabletop scenarios, integrating speech, gestures, and scene context.

Contribution

It introduces a modular, explainable system that operates without task-specific training data, combining multiple modalities for robust human-robot interaction.

Findings

01

Achieved 82.39% success rate on real-world interaction data

02

Demonstrated robustness to noise, diversity, and ambiguity

03

Provided publicly available dataset and code for future research

Abstract

This paper presents UNCOM, a novel hybrid framework for interpreting natural human commands in tabletop scenarios. The system integrates multiple sources of information -- speech, gestures, and scene context -- to extract structured, actionable instructions for robots. Addressing the need for general-purpose human-robot interaction in domestic environments, UNCOM is designed for zero-shot operation, without reliance on predefined object models or training data specific to a given task. Using foundational and task-specific deep learning models, it allows out-of-the-box speech recognition, natural language understanding, gesture detection, and object segmentation. The modular architecture enhances transparency and explainability by explicitly parsing commands into object-action-target representations, enabling integration with symbolic robotic frameworks. We demonstrate the system in a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.