Leveraging Conditional Mutual Information to Improve Large Language   Model Fine-Tuning For Classification

Thanushon Sivakaran; En-Hui Yang

arXiv:2502.11258·cs.CL·May 1, 2025

Leveraging Conditional Mutual Information to Improve Large Language Model Fine-Tuning For Classification

Thanushon Sivakaran, En-Hui Yang

PDF

Open Access

TL;DR

This paper introduces the use of Conditional Mutual Information (CMI) in fine-tuning large language models for classification, achieving performance improvements on multiple GLUE tasks by minimizing or maximizing CMI during training and knowledge distillation.

Contribution

It adapts the CMI-constrained deep learning framework for LLM fine-tuning, demonstrating its effectiveness in improving both standalone models and student models in classification tasks.

Findings

01

Minimizing CMI improves BERT performance on 6 of 8 GLUE tasks.

02

Maximizing CMI enhances DistilBERT performance on 6 of 8 GLUE tasks.

03

CMI-based fine-tuning outperforms baseline methods in several classification benchmarks.

Abstract

Although large language models (LLMs) have demonstrated remarkable capabilities in recent years, the potential of information theory (IT) to enhance LLM development remains underexplored. This paper introduces the information theoretic principle of Conditional Mutual Information (CMI) to LLM fine-tuning for classification tasks, exploring its promise in two main ways: minimizing CMI to improve a model's standalone performance and maximizing CMI to enhance knowledge distillation (KD) for more capable student models. To apply CMI in LLM fine-tuning, we adapt the recently proposed CMI-constrained deep learning framework, which was initially developed for image classification, with some modification. By minimizing CMI during LLM fine-tuning, we achieve superior performance gains on 6 of 8 GLUE classification tasks compared to BERT. Additionally, maximizing CMI during the KD process results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Adam · Softmax · Dropout · Weight Decay · WordPiece · Layer Normalization · Residual Connection · Linear Layer