Ensemble ALBERT on SQuAD 2.0

Shilun Li; Renee Li; Veronica Peng

arXiv:2110.09665·cs.CL·October 20, 2021

Ensemble ALBERT on SQuAD 2.0

Shilun Li, Renee Li, Veronica Peng

PDF

Open Access 1 Repo

TL;DR

This paper enhances question answering performance on SQuAD 2.0 by fine-tuning ALBERT models with additional layers and ensemble methods, achieving state-of-the-art results.

Contribution

It introduces multiple layered models based on ALBERT and applies ensemble algorithms to significantly improve SQuAD 2.0 performance.

Findings

01

Best model achieved F1 score of 88.435 on dev set

02

Ensemble methods improved F1 to 90.123 on leaderboard

03

Model variations outperform baseline ALBERT models

Abstract

Machine question answering is an essential yet challenging task in natural language processing. Recently, Pre-trained Contextual Embeddings (PCE) models like Bidirectional Encoder Representations from Transformers (BERT) and A Lite BERT (ALBERT) have attracted lots of attention due to their great performance in a wide range of NLP tasks. In our Paper, we utilized the fine-tuned ALBERT models and implemented combinations of additional layers (e.g. attention layer, RNN layer) on top of them to improve model performance on Stanford Question Answering Dataset (SQuAD 2.0). We implemented four different models with different layers on top of ALBERT-base model, and two other models based on ALBERT-xlarge and ALBERT-xxlarge. We compared their performance to our baseline model ALBERT-base-v2 + ALBERT-SQuAD-out with details. Our best-performing individual model is ALBERT-xxlarge +…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Shilun-Allan-Li/Ensemble-ALBERT-on-SQuAD-2.0
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsMulti-Head Attention · Attention Is All You Need · Test · Linear Layer · Weight Decay · Softmax · Linear Warmup With Linear Decay · Residual Connection · WordPiece · Attention Dropout