Exploring Strategies for Generalizable Commonsense Reasoning with   Pre-trained Models

Kaixin Ma; Filip Ilievski; Jonathan Francis; Satoru Ozaki; Eric; Nyberg; Alessandro Oltramari

arXiv:2109.02837·cs.CL·September 8, 2021

Exploring Strategies for Generalizable Commonsense Reasoning with Pre-trained Models

Kaixin Ma, Filip Ilievski, Jonathan Francis, Satoru Ozaki, Eric, Nyberg, Alessandro Oltramari

PDF

Open Access 1 Repo

TL;DR

This paper investigates how different adaptation methods affect pre-trained models' ability to generalize in commonsense reasoning tasks, highlighting fine-tuning's strengths and limitations compared to lightweight alternatives.

Contribution

It provides a comparative analysis of fine-tuning and lightweight adaptation methods on commonsense reasoning, emphasizing their impact on generalization and robustness.

Findings

01

Fine-tuning achieves the highest accuracy but overfits and limits generalization.

02

Prefix-tuning offers comparable accuracy with better generalization to unseen answers.

03

Lightweight methods are more robust to adversarial data splits.

Abstract

Commonsense reasoning benchmarks have been largely solved by fine-tuning language models. The downside is that fine-tuning may cause models to overfit to task-specific data and thereby forget their knowledge gained during pre-training. Recent works only propose lightweight model updates as models may already possess useful knowledge from past experience, but a challenge remains in understanding what parts and to what extent models should be refined for a given task. In this paper, we investigate what models learn from commonsense reasoning datasets. We measure the impact of three different adaptation methods on the generalization and accuracy of models. Our experiments with two models show that fine-tuning performs best, by learning both the content and the structure of the task, but suffers from overfitting and limited generalization to novel answers. We observe that alternative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mayer123/cs_model_adaptation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications