A Common Evaluation Setting for Just.Ask, Open Ephyra and Aranea QA systems
Ricardo Pires

TL;DR
This paper proposes a unified evaluation framework for comparing QA systems like Just.Ask, Open Ephyra, and Aranea, addressing inconsistencies in testing conditions and analyzing the impact of different pipeline stages.
Contribution
It introduces a common evaluation setting for multiple QA systems, enabling fair comparison and analysis of their components and techniques.
Findings
Standardized evaluation setting facilitates fair comparison
Analysis of pipeline stage impact on QA performance
Insights into technique transferability between systems
Abstract
Question Answering (QA) is not a new research field in Natural Language Processing (NLP). However in recent years, QA has been a subject of growing study. Nowadays, most of the QA systems have a similar pipelined architecture and each system use a set of unique techniques to accomplish its state of the art results. However, many things are not clear in the QA processing. It is not clear the extend of the impact of tasks performed in earlier stages in following stages of the pipelining process. It is not clear, if techniques used in a QA system can be used in another QA system to improve its results. And finally, it is not clear in what setting should be these systems tested in order to properly analyze their results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Bayesian Modeling and Causal Inference
