TL;DR
MMEAD provides a comprehensive resource of entity annotations and disambiguations for MS MARCO datasets, facilitating improved information retrieval and interactive search applications through easy-to-use tools and entity linking techniques.
Contribution
This paper introduces MMEAD, a new resource and tool for entity linking in MS MARCO datasets, enabling enhanced IR research and applications.
Findings
Improved recall@1000 and MRR@10 on complex queries using MMEAD.
Demonstrated utility of entity expansions for interactive search.
Released entity links for MS MARCO datasets using REL and BLINK systems.
Abstract
MMEAD, or MS MARCO Entity Annotations and Disambiguations, is a resource for entity links for the MS MARCO datasets. We specify a format to store and share links for both document and passage collections of MS MARCO. Following this specification, we release entity links to Wikipedia for documents and passages in both MS MARCO collections (v1 and v2). Entity links have been produced by the REL and BLINK systems. MMEAD is an easy-to-install Python package, allowing users to load the link data and entity embeddings effortlessly. Using MMEAD takes only a few lines of code. Finally, we show how MMEAD can be used for IR research that uses entity information. We show how to improve recall@1000 and MRR@10 on more complex queries on the MS MARCO v1 passage dataset by using this resource. We also demonstrate how entity expansions can be used for interactive search applications.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
