Go Beyond Plain Fine-tuning: Improving Pretrained Models for Social   Commonsense

Ting-Yun Chang; Yang Liu; Karthik Gopalakrishnan; Behnam Hedayatnia,; Pei Zhou; Dilek Hakkani-Tur

arXiv:2105.05913·cs.CL·May 14, 2021·1 cites

Go Beyond Plain Fine-tuning: Improving Pretrained Models for Social Commonsense

Ting-Yun Chang, Yang Liu, Karthik Gopalakrishnan, Behnam Hedayatnia,, Pei Zhou, Dilek Hakkani-Tur

PDF

Open Access

TL;DR

This paper enhances pretrained language models like RoBERTa and GPT-2 for social commonsense reasoning tasks, specifically improving performance on the Social IQA dataset by proposing architecture modifications and leveraging external data.

Contribution

It introduces new architecture variations and external data integration methods to improve pretrained models' social commonsense reasoning capabilities.

Findings

01

Achieves competitive results on Social IQA leaderboard

02

Demonstrates effectiveness of architecture extensions and external data

03

Shows pretrained models can be tailored for social intelligence tasks

Abstract

Pretrained language models have demonstrated outstanding performance in many NLP tasks recently. However, their social intelligence, which requires commonsense reasoning about the current situation and mental states of others, is still developing. Towards improving language models' social intelligence, we focus on the Social IQA dataset, a task requiring social and emotional commonsense reasoning. Building on top of the pretrained RoBERTa and GPT2 models, we propose several architecture variations and extensions, as well as leveraging external commonsense corpora, to optimize the model for Social IQA. Our proposed system achieves competitive results as those top-ranking models on the leaderboard. This work demonstrates the strengths of pretrained language models, and provides viable ways to improve their performance for a particular task.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Linear Warmup With Linear Decay · Layer Normalization · Softmax · Multi-Head Attention · Residual Connection · WordPiece · Weight Decay