OAT: Ordered Action Tokenization
Chaoqi Liu, Xiaoshen Han, Jiawei Gao, Yue Zhao, Haonan Chen, Yilun Du

TL;DR
OAT introduces a novel learned action tokenizer that enables efficient, structured, and flexible autoregressive robot control by discretizing actions into an ordered token sequence, improving performance across diverse tasks.
Contribution
The paper proposes Ordered Action Tokenization (OAT), satisfying high compression, total decodability, and causal ordering, advancing discrete action modeling for autoregressive robot policies.
Findings
OAT outperforms prior tokenization schemes and diffusion baselines.
OAT enables anytime inference trade-offs between cost and fidelity.
OAT demonstrates consistent improvements across 20+ tasks in simulation and real-world settings.
Abstract
Autoregressive policies offer a compelling foundation for scalable robot learning by enabling discrete abstraction, token-level reasoning, and flexible inference. However, applying autoregressive modeling to continuous robot actions requires an effective action tokenization scheme. Existing approaches either rely on analytical discretization methods that produce prohibitively long token sequences, or learned latent tokenizers that lack structure, limiting their compatibility with next-token prediction. In this work, we identify three desiderata for action tokenization - high compression, total decodability, and a left-to-right causally ordered token space - and introduce Ordered Action Tokenization (OAT), a learned action tokenizer that satisfies all three. OAT discretizes action chunks into an ordered sequence of tokens using transformer with registers, finite scalar quantization, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Generative Adversarial Networks and Image Synthesis
