Leveraging Conditional Mutual Information to Improve Large Language Model Fine-Tuning For Classification
Thanushon Sivakaran, En-Hui Yang

TL;DR
This paper introduces the use of Conditional Mutual Information (CMI) in fine-tuning large language models for classification, achieving performance improvements on multiple GLUE tasks by minimizing or maximizing CMI during training and knowledge distillation.
Contribution
It adapts the CMI-constrained deep learning framework for LLM fine-tuning, demonstrating its effectiveness in improving both standalone models and student models in classification tasks.
Findings
Minimizing CMI improves BERT performance on 6 of 8 GLUE tasks.
Maximizing CMI enhances DistilBERT performance on 6 of 8 GLUE tasks.
CMI-based fine-tuning outperforms baseline methods in several classification benchmarks.
Abstract
Although large language models (LLMs) have demonstrated remarkable capabilities in recent years, the potential of information theory (IT) to enhance LLM development remains underexplored. This paper introduces the information theoretic principle of Conditional Mutual Information (CMI) to LLM fine-tuning for classification tasks, exploring its promise in two main ways: minimizing CMI to improve a model's standalone performance and maximizing CMI to enhance knowledge distillation (KD) for more capable student models. To apply CMI in LLM fine-tuning, we adapt the recently proposed CMI-constrained deep learning framework, which was initially developed for image classification, with some modification. By minimizing CMI during LLM fine-tuning, we achieve superior performance gains on 6 of 8 GLUE classification tasks compared to BERT. Additionally, maximizing CMI during the KD process results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Adam · Softmax · Dropout · Weight Decay · WordPiece · Layer Normalization · Residual Connection · Linear Layer
