Towards Flexible Inference in Sequential Decision Problems via   Bidirectional Transformers

Micah Carroll; Jessy Lin; Orr Paradise; Raluca Georgescu; Mingfei Sun,; David Bignell; Stephanie Milani; Katja Hofmann; Matthew Hausknecht; Anca; Dragan; Sam Devlin

arXiv:2204.13326·cs.LG·December 13, 2022·1 cites

Towards Flexible Inference in Sequential Decision Problems via Bidirectional Transformers

Micah Carroll, Jessy Lin, Orr Paradise, Raluca Georgescu, Mingfei Sun,, David Bignell, Stephanie Milani, Katja Hofmann, Matthew Hausknecht, Anca, Dragan, Sam Devlin

PDF

Open Access

TL;DR

This paper introduces FlexiBiT, a unified bidirectional transformer framework that can be trained on various sequential decision-making tasks, achieving comparable or better performance than specialized models and benefiting from fine-tuning.

Contribution

The paper presents FlexiBiT, a novel transformer-based framework that unifies multiple sequential decision tasks into a single model, enabling flexible inference and improved performance.

Findings

01

FlexiBiT performs on par or better than specialized models across tasks.

02

A single model can handle behavior cloning, offline RL, inverse dynamics, and waypoint conditioning.

03

Fine-tuning FlexiBiT enhances task-specific performance.

Abstract

Randomly masking and predicting word tokens has been a successful approach in pre-training language models for a variety of downstream tasks. In this work, we observe that the same idea also applies naturally to sequential decision making, where many well-studied tasks like behavior cloning, offline RL, inverse dynamics, and waypoint conditioning correspond to different sequence maskings over a sequence of states, actions, and returns. We introduce the FlexiBiT framework, which provides a unified way to specify models which can be trained on many different sequential decision making tasks. We show that a single FlexiBiT model is simultaneously capable of carrying out many tasks with performance similar to or better than specialized models. Additionally, we show that performance can be further improved by fine-tuning our general model on specific tasks of interest.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems