Optimising Language Models for Downstream Tasks: A Post-Training Perspective
Zhengyan Shi

TL;DR
This paper introduces novel post-training techniques and evaluation methods to improve language models' efficiency, robustness, and adaptability for diverse downstream NLP tasks, addressing limitations of traditional fine-tuning.
Contribution
It presents a series of methods including a novel continued pre-training approach, parameter-efficient fine-tuning, and new benchmarks for better LM adaptation and evaluation.
Findings
Outperforms state-of-the-art semi-supervised approaches
Reduces memory and compute costs significantly
Enhances performance on instruction-following and reasoning tasks
Abstract
Language models (LMs) have demonstrated remarkable capabilities in NLP, yet adapting them efficiently and robustly to specific tasks remains challenging. As their scale and complexity grow, fine-tuning LMs on labelled data often underutilizes available unlabelled data, leads to overfitting on small task-specific sets, and imposes significant computational costs. These limitations hamper their application to the open-ended landscape of real-world language tasks. This thesis proposes a series of methods to better adapt LMs to downstream applications. First, we explore strategies for extracting task-relevant knowledge from unlabelled data, introducing a novel continued pre-training technique that outperforms state-of-the-art semi-supervised approaches. Next, we present a parameter-efficient fine-tuning method that substantially reduces memory and compute costs while maintaining…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Intelligent Tutoring Systems and Adaptive Learning
