It's All in the Heads: Using Attention Heads as a Baseline for Cross-Lingual Transfer in Commonsense Reasoning
Alexey Tikhonov, Max Ryabinin

TL;DR
This paper proposes a simple linear classifier using attention head weights for cross-lingual commonsense reasoning, demonstrating that a small subset of attention heads captures universal reasoning abilities across languages.
Contribution
It introduces a novel, straightforward method leveraging attention heads for multilingual commonsense reasoning and provides evidence of universal reasoning capabilities in multilingual models.
Findings
The method performs competitively with recent approaches in multilingual settings.
Most performance is driven by a small subset of attention heads across languages.
Universal reasoning capabilities are evidenced in multilingual encoders.
Abstract
Commonsense reasoning is one of the key problems in natural language processing, but the relative scarcity of labeled data holds back the progress for languages other than English. Pretrained cross-lingual models are a source of powerful language-agnostic representations, yet their inherent reasoning capabilities are still actively studied. In this work, we design a simple approach to commonsense reasoning which trains a linear classifier with weights of multi-head attention as features. To evaluate this approach, we create a multilingual Winograd Schema corpus by processing several datasets from prior work within a standardized pipeline and measure cross-lingual generalization ability in terms of out-of-sample performance. The method performs competitively with recent supervised and unsupervised approaches for commonsense reasoning, even when applied to other languages in a zero-shot…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSoftmax · Linear Layer
