A Latent Space Theory for Emergent Abilities in Large Language Models
Hui Jiang

TL;DR
This paper proposes a latent space framework suggesting that the emergent abilities of large language models stem from Bayesian inference on the sparse joint distribution of language meanings, linking language structure to model capabilities.
Contribution
It introduces a novel latent space theory that explains emergent abilities in LLMs as Bayesian inference on sparse language distributions, offering a new perspective on model capabilities.
Findings
Emergent abilities are linked to Bayesian inference on language distributions.
Languages exhibit a sparse joint distribution with peaked correlations.
The theory explains abilities like in-context learning and chain-of-thought prompting.
Abstract
Languages are not created randomly but rather to communicate information. There is a strong association between languages and their underlying meanings, resulting in a sparse joint distribution that is heavily peaked according to their correlations. Moreover, these peak values happen to match with the marginal distribution of languages due to the sparsity. With the advent of LLMs trained on big data and large models, we can now precisely assess the marginal distribution of languages, providing a convenient means of exploring the sparse structures in the joint distribution for effective inferences. In this paper, we categorize languages as either unambiguous or {\epsilon}-ambiguous and present quantitative results to demonstrate that the emergent abilities of LLMs, such as language understanding, in-context learning, chain-of-thought prompting, and effective instruction fine-tuning, can…
Peer Reviews
Decision·Submitted to ICLR 2024
- The authors discuss important concepts that are yet to be explained (in-context learning, CoT, instruction finetuning). - The paper is well written and straightforward to follow.
- While additional concepts (chain of though, instruction finetuning) are addressed in the paper compared with prior work, there does not seem to be sufficient novelty in the theoretical results. Xie, 2022 already defined a framework for explaining in-context learning as state (or intention) estimation. - The chain of though prompts section is not rigorously defined and does not build on the intention setup for in-context learning (Sec. 2, 3, 5). The estimation of the intent for instruction fine
The problem itself is very important.
I agree that the emergent behaviors of LLMs need a thorough scientific investigation, but this paper says almost nothing about these emergent abilities. What is important here is that LLM seem to solve a new task that is not contained in the training data: that is called emergent. However, just finding a latent "intention" (quite loosely defined in this paper) cannot explain this behavior, because the new emergent intention just doesn't exist so far. Therefore, in spite of elementary mathematic
1. This paper provides a new look at the role of the link between language and underlying intentions in enabling emergent properties. It does so using both theoretical arguments and experiments. 2. The formal results are sound, as far as I could tell (though with some question, as described below).
1. It remains unclear how the proposed account substantially improves over Xie et al 2022, which already explained ICL in terms of the recovery of an underlying state / intention. The overall claim, and the explanation of in-context learning in Section 5, are similar to Xie et al. 2022. The paper additionally provides sections about Chain-of-Thought prompting (Section 6) and instruction finetuning (Section 7), but they are quite informal and unspecific. The arguments in Section 6 are informal, u
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Algorithms
