Making the Most of your Model: Methods for Finetuning and Applying Pretrained Transformers
Davis Yoshida

TL;DR
This paper introduces new finetuning and inference techniques for pretrained transformers, enhancing their efficiency, generative capabilities, and prediction quality across various NLP tasks.
Contribution
It presents two novel finetuning methods, two inference improvement techniques, and provides insights into model-likelihood divergence, broadening transformer application scope.
Findings
Recurrence mechanism improves transformer decoder efficiency.
Masked language models can initialize non-autoregressive seq2seq models.
Hidden state optimization enhances prediction quality at inference.
Abstract
This thesis provides methods and analysis of models which make progress on this goal. The techniques outlined are task agnostic, and should provide benefit when used with nearly any transformer LM. We introduce two new finetuning methods which add new capabilities to the models they are used on. The first adds a recurrence mechanism, which removes the fixed-window sized constraint and improves the efficiency of a transformer decoder. The second allows masked language models (MLMs) to be used for initialization of both the encoder and decoder of a non-autoregressive sequence-to-sequence transformer, opening up generative applications of models which were previously only used for natural language understanding tasks. We also introduce two new techniques for improving the quality of predictions of any transformer decoder without additional finetuning. One, hidden state optimization, can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies
