SpEnD: Linked Data SPARQL Endpoints Discovery Using Search Engines
Semih Yumusak, Erdogan Dogdu, Halife Kodaz, Andreas Kamilaris

TL;DR
This paper introduces SpEnD, a novel metacrawling approach that leverages search engines to discover and monitor linked data sources and SPARQL endpoints on the Web, enhancing coverage and updating capabilities.
Contribution
The paper presents a new metacrawling method and a prototype system for discovering linked data sources and SPARQL endpoints using search engine queries and link analysis.
Findings
Discovered most existing SPARQL endpoints in repositories
Identified numerous new SPARQL endpoints
Developed a SPARQL endpoint crawler (SpEC) for link analysis
Abstract
In this study, a novel metacrawling method is proposed for discovering and monitoring linked data sources on the Web. We implemented the method in a prototype system, named SPARQL Endpoints Discovery (SpEnD). SpEnD starts with a "search keyword" discovery process for finding relevant keywords for the linked data domain and specifically SPARQL endpoints. Then, these search keywords are utilized to find linked data sources via popular search engines (Google, Bing, Yahoo, Yandex). By using this method, most of the currently listed SPARQL endpoints in existing endpoint repositories, as well as a significant number of new SPARQL endpoints, have been discovered. Finally, we have developed a new SPARQL endpoint crawler (SpEC) for crawling and link analysis.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
