The RepEval 2017 Shared Task: Multi-Genre Natural Language Inference with Sentence Representations
Nikita Nangia, Adina Williams, Angeliki Lazaridou, Samuel R. Bowman

TL;DR
This paper reports on the RepEval 2017 Shared Task, evaluating neural network models for sentence representation on the MultiNLI corpus, showing that advanced models outperform baselines and learn domain-independent sentence meanings.
Contribution
It introduces a shared task for evaluating neural sentence representations on MultiNLI and demonstrates that models with residual BiLSTMs achieve state-of-the-art results.
Findings
Best model achieved 74.5% accuracy on genre-matched test set.
Models learned domain-independent sentence representations.
All submitted systems outperformed baseline models.
Abstract
This paper presents the results of the RepEval 2017 Shared Task, which evaluated neural network sentence representation learning models on the Multi-Genre Natural Language Inference corpus (MultiNLI) recently introduced by Williams et al. (2017). All of the five participating teams beat the bidirectional LSTM (BiLSTM) and continuous bag of words baselines reported in Williams et al.. The best single model used stacked BiLSTMs with residual connections to extract sentence features and reached 74.5% accuracy on the genre-matched test set. Surprisingly, the results of the competition were fairly consistent across the genre-matched and genre-mismatched test sets, and across subsets of the test data representing a variety of linguistic phenomena, suggesting that all of the submitted systems learned reasonably domain-independent representations for sentence meaning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
