Gestalt: a Stacking Ensemble for SQuAD2.0
Mohamed El-Geish

TL;DR
This paper introduces a stacking ensemble system for SQuAD2.0 that combines multiple models to improve answer prediction accuracy, achieving slight but meaningful performance gains over individual models.
Contribution
It presents a novel ensemble approach using a CNN-based meta-model to effectively blend heterogeneous models for SQuAD2.0 question answering.
Findings
Achieved 87.117 EM and 90.306 F1 scores with the ensemble.
Improved performance by 0.55% EM and 0.61% F1 over the best single model.
Demonstrated the effectiveness of stacking ensembles in QA tasks.
Abstract
We propose a deep-learning system -- for the SQuAD2.0 task -- that finds, or indicates the lack of, a correct answer to a question in a context paragraph. Our goal is to learn an ensemble of heterogeneous SQuAD2.0 models that, when blended properly, outperforms the best model in the ensemble per se. We created a stacking ensemble that combines top-N predictions from two models, based on ALBERT and RoBERTa, into a multiclass classification task to pick the best answer out of their predictions. We explored various ensemble configurations, input representations, and model architectures. For evaluation, we examined test-set EM and F1 scores; our best-performing ensemble incorporated a CNN-based meta-model and scored 87.117 and 90.306, respectively -- a relative improvement of 0.55% for EM and 0.61% for F1 scores, compared to the baseline performance of the best model in the ensemble, an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Software Engineering Research
MethodsLinear Layer · Weight Decay · Dropout · Attention Dropout · Linear Warmup With Linear Decay · BERT · RoBERTa · Residual Connection · Adam · LAMB
