Transformers and Slot Encoding for Sample Efficient Physical World   Modelling

Francesco Petri; Luigi Asprino; Aldo Gangemi

arXiv:2405.20180·cs.LG·May 31, 2024

Transformers and Slot Encoding for Sample Efficient Physical World Modelling

Francesco Petri, Luigi Asprino, Aldo Gangemi

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel neural architecture combining Transformers with slot-attention to improve sample efficiency and object-based scene understanding in world modelling from video data.

Contribution

It presents a new architecture that integrates Transformers with slot-attention, enhancing object-based scene representation and sample efficiency in world modelling tasks.

Findings

01

Improved sample efficiency over existing methods

02

Reduced performance variation across training examples

03

Enhanced object-based scene understanding

Abstract

World modelling, i.e. building a representation of the rules that govern the world so as to predict its evolution, is an essential ability for any agent interacting with the physical world. Recent applications of the Transformer architecture to the problem of world modelling from video input show notable improvements in sample efficiency. However, existing approaches tend to work only at the image level thus disregarding that the environment is composed of objects interacting with each other. In this paper, we propose an architecture combining Transformers for world modelling with the slot-attention paradigm, an approach for learning representations of objects appearing in a scene. We describe the resulting neural architecture and report experimental results showing an improvement over the existing solutions in terms of sample efficiency and a reduction of the variation of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

torchipeppo/transformers-and-slot-encoding-for-wm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFunctional Brain Connectivity Studies

MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Multi-Head Attention · Dropout · Dense Connections