Exploring the LLM Journey from Cognition to Expression with Linear   Representations

Yuzi Yan; Jialian Li; Yipin Zhang; Dong Yan

arXiv:2405.16964·cs.CL·November 11, 2024·1 cites

Exploring the LLM Journey from Cognition to Expression with Linear Representations

Yuzi Yan, Jialian Li, Yipin Zhang, Dong Yan

PDF

Open Access

TL;DR

This paper investigates how large language models develop cognitive and expressive abilities across training phases, revealing their sequential growth, correlations, and potential strategies to enhance their capabilities.

Contribution

It introduces a framework to analyze LLM capabilities via linear representations and explores the developmental trajectories and their theoretical underpinnings.

Findings

01

Cognitive abilities mainly develop during pretraining.

02

Expressive abilities improve during fine-tuning and RLHF.

03

Strategies like few-shot learning can bridge the capability gap.

Abstract

This paper presents an in-depth examination of the evolution and interplay of cognitive and expressive capabilities in large language models (LLMs), with a specific focus on Baichuan-7B and Baichuan-33B, an advanced bilingual (Chinese and English) LLM series. We define and explore the model's cognitive and expressive capabilities through linear representations across three critical phases: Pretraining, Supervised Fine-Tuning (SFT), and Reinforcement Learning from Human Feedback (RLHF). Cognitive capability is defined as the quantity and quality of information conveyed by the neuron output vectors within the network, similar to the neural signal processing in human cognition. Expressive capability is defined as the model's capability to produce word-level output. Our findings unveil a sequential development pattern, where cognitive abilities are largely established during Pretraining,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Natural Language Processing Techniques · Business Process Modeling and Analysis

MethodsFocus · Shrink and Fine-Tune