PoliFormer: Scaling On-Policy RL with Transformers Results in Masterful   Navigators

Kuo-Hao Zeng; Zichen Zhang; Kiana Ehsani; Rose Hendrix; Jordi; Salvador; Alvaro Herrasti; Ross Girshick; Aniruddha Kembhavi; Luca Weihs

arXiv:2406.20083·cs.RO·July 1, 2024

PoliFormer: Scaling On-Policy RL with Transformers Results in Masterful Navigators

Kuo-Hao Zeng, Zichen Zhang, Kiana Ehsani, Rose Hendrix, Jordi, Salvador, Alvaro Herrasti, Ross Girshick, Aniruddha Kembhavi, Luca Weihs

PDF

Open Access

TL;DR

PoliFormer is a scalable, transformer-based indoor navigation system trained entirely in simulation that generalizes well to real-world robots, achieving state-of-the-art success rates and versatile downstream capabilities.

Contribution

The paper introduces PoliFormer, a novel transformer-based RL navigation agent trained at scale, demonstrating superior real-world generalization and multi-task adaptability without finetuning.

Findings

01

Achieves 85.5% success in object goal navigation on CHORES-S.

02

Outperforms previous methods by 28.5% in success rate.

03

Extensible to multiple downstream navigation tasks.

Abstract

We present PoliFormer (Policy Transformer), an RGB-only indoor navigation agent trained end-to-end with reinforcement learning at scale that generalizes to the real-world without adaptation despite being trained purely in simulation. PoliFormer uses a foundational vision transformer encoder with a causal transformer decoder enabling long-term memory and reasoning. It is trained for hundreds of millions of interactions across diverse environments, leveraging parallelized, multi-machine rollouts for efficient training with high throughput. PoliFormer is a masterful navigator, producing state-of-the-art results across two distinct embodiments, the LoCoBot and Stretch RE-1 robots, and four navigation benchmarks. It breaks through the plateaus of previous work, achieving an unprecedented 85.5% success rate in object goal navigation on the CHORES-S benchmark, a 28.5% absolute improvement.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Games · Multi-Agent Systems and Negotiation

MethodsAttention Is All You Need · Softmax · Layer Normalization · Linear Layer · Dense Connections · Residual Connection · Multi-Head Attention · Vision Transformer