Making the Most of your Model: Methods for Finetuning and Applying   Pretrained Transformers

Davis Yoshida

arXiv:2408.16241·cs.CL·August 30, 2024

Making the Most of your Model: Methods for Finetuning and Applying Pretrained Transformers

Davis Yoshida

PDF

Open Access

TL;DR

This paper introduces new finetuning and inference techniques for pretrained transformers, enhancing their efficiency, generative capabilities, and prediction quality across various NLP tasks.

Contribution

It presents two novel finetuning methods, two inference improvement techniques, and provides insights into model-likelihood divergence, broadening transformer application scope.

Findings

01

Recurrence mechanism improves transformer decoder efficiency.

02

Masked language models can initialize non-autoregressive seq2seq models.

03

Hidden state optimization enhances prediction quality at inference.

Abstract

This thesis provides methods and analysis of models which make progress on this goal. The techniques outlined are task agnostic, and should provide benefit when used with nearly any transformer LM. We introduce two new finetuning methods which add new capabilities to the models they are used on. The first adds a recurrence mechanism, which removes the fixed-window sized constraint and improves the efficiency of a transformer decoder. The second allows masked language models (MLMs) to be used for initialization of both the encoder and decoder of a non-autoregressive sequence-to-sequence transformer, opening up generative applications of models which were previously only used for natural language understanding tasks. We also introduce two new techniques for improving the quality of predictions of any transformer decoder without additional finetuning. One, hidden state optimization, can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies