Investigating the Impact of Model Complexity in Large Language Models

Jing Luo; Huiyuan Wang; Weiran Huang

arXiv:2410.00699·cs.LG·October 2, 2024

Investigating the Impact of Model Complexity in Large Language Models

Jing Luo, Huiyuan Wang, Weiran Huang

PDF

Open Access

TL;DR

This paper explores how model complexity affects the performance of large language models, revealing a double descent phenomenon and identifying a sweet spot at minimal complexity through theoretical analysis and HMM-based experiments.

Contribution

It introduces a novel HMM-based modeling approach for autoregressive LLMs and provides a theoretical analysis of the relationship between model complexity and generalization, highlighting a double descent pattern.

Findings

01

Risk initially increases then decreases with model complexity.

02

The optimal model complexity occurs at the lowest possible size.

03

Experimental results align with theoretical predictions.

Abstract

Large Language Models (LLMs) based on the pre-trained fine-tuning paradigm have become pivotal in solving natural language processing tasks, consistently achieving state-of-the-art performance. Nevertheless, the theoretical understanding of how model complexity influences fine-tuning performance remains challenging and has not been well explored yet. In this paper, we focus on autoregressive LLMs and propose to employ Hidden Markov Models (HMMs) to model them. Based on the HMM modeling, we investigate the relationship between model complexity and the generalization capability in downstream tasks. Specifically, we consider a popular tuning paradigm for downstream tasks, head tuning, where all pre-trained parameters are frozen and only individual heads are trained atop pre-trained LLMs. Our theoretical analysis reveals that the risk initially increases and then decreases with rising model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsFocus