TL;DR
This paper presents a passage retrieval approach for Polish texts using BM25 for initial retrieval and an ensemble of multilingual Cross Encoders for reranking, achieving success in the Poleval 2023 challenge across multiple domains.
Contribution
It introduces a hybrid retrieval system combining BM25 with ensemble rerankers, demonstrating effectiveness in Polish passage retrieval and domain adaptation challenges.
Findings
BM25 effectively retrieves relevant passages in Polish.
Ensemble of Cross Encoders improves reranking performance.
Fine-tuning rerankers has limited benefits outside the training domain.
Abstract
Passage Retrieval has traditionally relied on lexical methods like TF-IDF and BM25. Recently, some neural network models have surpassed these methods in performance. However, these models face challenges, such as the need for large annotated datasets and adapting to new domains. This paper presents a winning solution to the Poleval 2023 Task 3: Passage Retrieval challenge, which involves retrieving passages of Polish texts in three domains: trivia, legal, and customer support. However, only the trivia domain was used for training and development data. The method used the OKAPI BM25 algorithm to retrieve documents and an ensemble of publicly available multilingual Cross Encoders for Reranking. Fine-tuning the reranker models slightly improved performance but only in the training domain, while it worsened in other domains.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsIs Venmo Customer Support Available 24/7? How to Reach a Real Person
