Exploring the Learning Capabilities of Language Models using LEVERWORLDS
Eitan Wagner, Amir Feder, Omri Abend

TL;DR
This paper introduces LEVERWORLDS, a framework for evaluating the sample efficiency of various learning methods, including Transformers, in physics-inspired worlds expressed in natural language, revealing their strengths and limitations.
Contribution
The paper presents LEVERWORLDS, a novel controlled environment for assessing learning algorithms' sample complexity and compares Transformers with classic methods in this setting.
Findings
Transformers generally succeed but are less sample efficient.
Classic methods outperform Transformers in sample efficiency.
Transformers show potential but currently struggle with the task.
Abstract
Learning a model of a stochastic setting often involves learning both general structure rules and specific properties of the instance. This paper investigates the interplay between learning the general and the specific in various learning methods, with emphasis on sample efficiency. We design a framework called {\sc LeverWorlds}, which allows the generation of simple physics-inspired worlds that follow a similar generative process with different distributions, and their instances can be expressed in natural language. These worlds allow for controlled experiments to assess the sample complexity of different learning methods. We experiment with classic learning algorithms as well as Transformer language models, both with fine-tuning and In-Context Learning (ICL). Our general finding is that (1) Transformers generally succeed in the task; but (2) they are considerably less sample efficient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Layer Normalization · Dense Connections · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Label Smoothing · Byte Pair Encoding
