A Data Driven Approach to Query Expansion in Question Answering
Leon Derczynski, Jun Wang, Robert Gaizauskas, Mark A. Greenwood

TL;DR
This paper investigates a data-driven method for query expansion in question answering systems, showing that using answer texts from past evaluations can improve retrieval performance on difficult questions.
Contribution
It introduces a novel approach to identify performance-enhancing words from previous QA data for query expansion, highlighting its effectiveness over traditional methods.
Findings
Data-driven expansion words improve performance on over 70% of difficult questions
Simple relevance feedback is generally ineffective for QA IR tasks
Analysis of previous QA data helps identify useful expansion terms
Abstract
Automated answering of natural language questions is an interesting and useful problem to solve. Question answering (QA) systems often perform information retrieval at an initial stage. Information retrieval (IR) performance, provided by engines such as Lucene, places a bound on overall system performance. For example, no answer bearing documents are retrieved at low ranks for almost 40% of questions. In this paper, answer texts from previous QA evaluations held as part of the Text REtrieval Conferences (TREC) are paired with queries and analysed in an attempt to identify performance-enhancing words. These words are then used to evaluate the performance of a query expansion method. Data driven extension words were found to help in over 70% of difficult questions. These words can be used to improve and evaluate query expansion methods. Simple blind relevance feedback (RF) was…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Information Retrieval and Search Behavior
