# End-to-End Open-Domain Question Answering with BERTserini

**Authors:** Wei Yang, Yuqing Xie, Aileen Lin, Xingyu Li, Luchen Tan, Kun Xiong,, Ming Li, and Jimmy Lin

arXiv: 1902.01718 · 2019-09-19

## TL;DR

This paper presents an end-to-end open-domain question answering system combining BERT with Anserini IR toolkit, achieving high accuracy on Wikipedia-based benchmarks by fine-tuning BERT with SQuAD.

## Contribution

It introduces a novel integration of BERT with an IR system for open-domain QA, demonstrating effective end-to-end performance on large-scale Wikipedia data.

## Key findings

- Significant improvement over previous QA results on benchmark datasets.
- Fine-tuning BERT with SQuAD suffices for high-accuracy answer span identification.
- Effective combination of IR and BERT-based reading for large corpus QA.

## Abstract

We demonstrate an end-to-end question answering system that integrates BERT with the open-source Anserini information retrieval toolkit. In contrast to most question answering and reading comprehension models today, which operate over small amounts of input text, our system integrates best practices from IR with a BERT-based reader to identify answers from a large corpus of Wikipedia articles in an end-to-end fashion. We report large improvements over previous results on a standard benchmark test collection, showing that fine-tuning pretrained BERT with SQuAD is sufficient to achieve high accuracy in identifying answer spans.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.01718/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/1902.01718/full.md

## References

20 references — full list in the complete paper: https://tomesphere.com/paper/1902.01718/full.md

---
Source: https://tomesphere.com/paper/1902.01718