Exploring Information Processing in Large Language Models: Insights from   Information Bottleneck Theory

Zhou Yang; Zhengyu Qi; Zhaochun Ren; Zhikai Jia; Haizhou Sun; Xiaofei; Zhu; Xiangwen Liao

arXiv:2501.00999·cs.CL·January 7, 2025·2 cites

Exploring Information Processing in Large Language Models: Insights from Information Bottleneck Theory

Zhou Yang, Zhengyu Qi, Zhaochun Ren, Zhikai Jia, Haizhou Sun, Xiaofei, Zhu, Xiangwen Liao

PDF

Open Access

TL;DR

This paper investigates how large language models process information using the Information Bottleneck Theory, revealing their compression and extraction mechanisms, and introduces novel methods to improve reasoning and inference efficiency.

Contribution

It proposes a non-training strategy to define task spaces in LLMs and introduces two new approaches, IC-ICL and TS-FT, to enhance performance and inference speed.

Findings

01

LLMs compress input into task-specific spaces like sentiment or topic.

02

They extract relevant information at critical moments for accurate predictions.

03

IC-ICL improves reasoning and speeds up inference by over 40%.

Abstract

Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of tasks by understanding input information and predicting corresponding outputs. However, the internal mechanisms by which LLMs comprehend input and make effective predictions remain poorly understood. In this paper, we explore the working mechanism of LLMs in information processing from the perspective of Information Bottleneck Theory. We propose a non-training construction strategy to define a task space and identify the following key findings: (1) LLMs compress input information into specific task spaces (e.g., sentiment space, topic space) to facilitate task understanding; (2) they then extract and utilize relevant information from the task space at critical moments to generate accurate predictions. Based on these insights, we introduce two novel approaches: an Information Compression-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings