# SocialIQA: Commonsense Reasoning about Social Interactions

**Authors:** Maarten Sap, Hannah Rashkin, Derek Chen, Ronan LeBras, Yejin Choi

arXiv: 1904.09728 · 2019-09-10

## TL;DR

Social IQa is a large-scale benchmark designed to evaluate and improve commonsense reasoning about social interactions, highlighting the gap between current models and human understanding.

## Contribution

It introduces the first extensive social interaction reasoning benchmark and demonstrates its utility for transfer learning to other commonsense tasks.

## Key findings

- Benchmark is challenging for existing models.
- Models lag behind humans by over 20%.
- Achieves state-of-the-art on other reasoning tasks.

## Abstract

We introduce Social IQa, the first largescale benchmark for commonsense reasoning about social situations. Social IQa contains 38,000 multiple choice questions for probing emotional and social intelligence in a variety of everyday situations (e.g., Q: "Jordan wanted to tell Tracy a secret, so Jordan leaned towards Tracy. Why did Jordan do this?" A: "Make sure no one else could hear"). Through crowdsourcing, we collect commonsense questions along with correct and incorrect answers about social interactions, using a new framework that mitigates stylistic artifacts in incorrect answers by asking workers to provide the right answer to a different but related question. Empirical results show that our benchmark is challenging for existing question-answering models based on pretrained language models, compared to human performance (>20% gap). Notably, we further establish Social IQa as a resource for transfer learning of commonsense knowledge, achieving state-of-the-art performance on multiple commonsense reasoning tasks (Winograd Schemas, COPA).

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1904.09728/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/1904.09728/full.md

## References

44 references — full list in the complete paper: https://tomesphere.com/paper/1904.09728/full.md

---
Source: https://tomesphere.com/paper/1904.09728