A strong baseline for question relevancy ranking

Ana V. Gonz\'alez-Gardu\~no; Isabelle Augenstein; Anders S{\o}gaard

arXiv:1808.08836·cs.CL·August 28, 2018

A strong baseline for question relevancy ranking

Ana V. Gonz\'alez-Gardu\~no, Isabelle Augenstein, Anders S{\o}gaard

PDF

Open Access

TL;DR

This paper introduces a simple, fast, language-independent baseline model for question relevancy ranking that outperforms complex systems and even Google search rankings in shared tasks.

Contribution

The authors propose a multi-task feed forward network using 14 distance measures as features, providing a strong, efficient baseline for question relevancy ranking.

Findings

01

Outperforms state-of-the-art shared task systems

02

Faster training with simple features

03

Surpasses Google search rankings in relevancy retrieval

Abstract

The best systems at the SemEval-16 and SemEval-17 community question answering shared tasks -- a task that amounts to question relevancy ranking -- involve complex pipelines and manual feature engineering. Despite this, many of these still fail at beating the IR baseline, i.e., the rankings provided by Google's search engine. We present a strong baseline for question relevancy ranking by training a simple multi-task feed forward network on a bag of 14 distance measures for the input question pair. This baseline model, which is fast to train and uses only language-independent features, outperforms the best shared task systems on the task of retrieving relevant previously asked questions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications