TL;DR
This paper presents a novel approach to semantic search by framing it as paraphrase span detection, demonstrating significant improvements over traditional retrieval methods on Finnish paraphrase data.
Contribution
It introduces a span detection model for semantic search and a back-translation method for generating training data in low-resource languages.
Findings
Span detection model outperforms lexical and embedding baselines by over 20 percentage points.
Model achieves higher exact match and F-score metrics, showing effectiveness.
Back-translation method enables training data creation for languages lacking annotated resources.
Abstract
In this paper, we approach the problem of semantic search by framing the search task as paraphrase span detection, i.e. given a segment of text as a query phrase, the task is to identify its paraphrase in a given document, the same modelling setup as typically used in extractive question answering. On the Turku Paraphrase Corpus of 100,000 manually extracted Finnish paraphrase pairs including their original document context, we find that our paraphrase span detection model outperforms two strong retrieval baselines (lexical similarity and BERT sentence embeddings) by 31.9pp and 22.4pp respectively in terms of exact match, and by 22.3pp and 12.9pp in terms of token-level F-score. This demonstrates a strong advantage of modelling the task in terms of span retrieval, rather than sentence similarity. Additionally, we introduce a method for creating artificial paraphrase data through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Linear Layer · Attention Dropout · Softmax · Refunds@Expedia|||How do I get a full refund from Expedia? · Adam · Residual Connection · Weight Decay · Linear Warmup With Linear Decay · Dense Connections
