# Attention Is (not) All You Need for Commonsense Reasoning

**Authors:** Tassilo Klein, Moin Nabi

arXiv: 1905.13497 · 2019-06-03

## TL;DR

This paper demonstrates that BERT's attention mechanisms can be effectively utilized for commonsense reasoning tasks, achieving state-of-the-art results, but also suggests that more than unsupervised learning may be needed for true commonsense understanding.

## Contribution

It introduces a simple attention-guided method for commonsense reasoning using BERT, showing strong empirical performance across multiple datasets.

## Key findings

- BERT's attention can be directly used for commonsense reasoning tasks.
- The proposed method outperforms previous state-of-the-art models.
- Commonsense reasoning may require more than unsupervised learning from large corpora.

## Abstract

The recently introduced BERT model exhibits strong performance on several language understanding benchmarks. In this paper, we describe a simple re-implementation of BERT for commonsense reasoning. We show that the attentions produced by BERT can be directly utilized for tasks such as the Pronoun Disambiguation Problem and Winograd Schema Challenge. Our proposed attention-guided commonsense reasoning method is conceptually simple yet empirically powerful. Experimental analysis on multiple datasets demonstrates that our proposed system performs remarkably well on all cases while outperforming the previously reported state of the art by a margin. While results suggest that BERT seems to implicitly learn to establish complex relationships between entities, solving commonsense reasoning tasks might require more than unsupervised models learned from huge text corpora.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.13497/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/1905.13497/full.md

## References

27 references — full list in the complete paper: https://tomesphere.com/paper/1905.13497/full.md

---
Source: https://tomesphere.com/paper/1905.13497