Attention-based Contrastive Learning for Winograd Schemas
Tassilo Klein, Moin Nabi

TL;DR
This paper introduces a novel self-supervised contrastive learning framework applied directly to Transformer self-attention to improve commonsense reasoning on Winograd Schemas, outperforming existing unsupervised methods.
Contribution
It presents the first contrastive learning approach at the attention level for Winograd Schema reasoning, enhancing unsupervised learning capabilities.
Findings
Outperforms all comparable unsupervised approaches
Occasionally surpasses supervised methods
Demonstrates superior commonsense reasoning on multiple datasets
Abstract
Self-supervised learning has recently attracted considerable attention in the NLP community for its ability to learn discriminative features using a contrastive objective. This paper investigates whether contrastive learning can be extended to Transfomer attention to tackling the Winograd Schema Challenge. To this end, we propose a novel self-supervised framework, leveraging a contrastive loss directly at the level of self-attention. Experimental analysis of our attention-based models on multiple datasets demonstrates superior commonsense reasoning capabilities. The proposed approach outperforms all comparable unsupervised approaches while occasionally surpassing supervised ones.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
MethodsContrastive Learning
