SENTINEL: A Fully End-to-End Language-Action Model for Humanoid Whole Body Control

Yuxuan Wang; Haobin Jiang; Shiqing Yao; Ziluo Ding; Zongqing Lu

arXiv:2511.19236·cs.RO·November 25, 2025

SENTINEL: A Fully End-to-End Language-Action Model for Humanoid Whole Body Control

Yuxuan Wang, Haobin Jiang, Shiqing Yao, Ziluo Ding, Zongqing Lu

PDF

Open Access

TL;DR

SENTINEL is an end-to-end model that directly translates language commands into humanoid robot actions, achieving strong understanding and stable execution in simulation and real-world scenarios.

Contribution

The paper introduces SENTINEL, a novel fully end-to-end language-action model for humanoid control that bypasses traditional modular pipelines and demonstrates effective real-world deployment.

Findings

01

Strong semantic understanding demonstrated

02

Stable execution in simulation and real-world

03

Supports multi-modal input extensions

Abstract

Existing humanoid control systems often rely on teleoperation or modular generation pipelines that separate language understanding from physical execution. However, the former is entirely human-driven, and the latter lacks tight alignment between language commands and physical behaviors. In this paper, we present SENTINEL, a fully end-to-end language-action model for humanoid whole-body control. We construct a large-scale dataset by tracking human motions in simulation using a pretrained whole body controller, combined with their text annotations. The model directly maps language commands and proprioceptive inputs to low-level actions without any intermediate representation. The model generates action chunks using flow matching, which can be subsequently refined by a residual action head for real-world deployment. Our method exhibits strong semantic understanding and stable execution on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Social Robot Interaction and HRI · Robot Manipulation and Learning