SpEnD: Linked Data SPARQL Endpoints Discovery Using Search Engines

Semih Yumusak; Erdogan Dogdu; Halife Kodaz; Andreas Kamilaris

arXiv:1608.02761·cs.IR·April 11, 2017

SpEnD: Linked Data SPARQL Endpoints Discovery Using Search Engines

Semih Yumusak, Erdogan Dogdu, Halife Kodaz, Andreas Kamilaris

PDF

TL;DR

This paper introduces SpEnD, a novel metacrawling approach that leverages search engines to discover and monitor linked data sources and SPARQL endpoints on the Web, enhancing coverage and updating capabilities.

Contribution

The paper presents a new metacrawling method and a prototype system for discovering linked data sources and SPARQL endpoints using search engine queries and link analysis.

Findings

01

Discovered most existing SPARQL endpoints in repositories

02

Identified numerous new SPARQL endpoints

03

Developed a SPARQL endpoint crawler (SpEC) for link analysis

Abstract

In this study, a novel metacrawling method is proposed for discovering and monitoring linked data sources on the Web. We implemented the method in a prototype system, named SPARQL Endpoints Discovery (SpEnD). SpEnD starts with a "search keyword" discovery process for finding relevant keywords for the linked data domain and specifically SPARQL endpoints. Then, these search keywords are utilized to find linked data sources via popular search engines (Google, Bing, Yahoo, Yandex). By using this method, most of the currently listed SPARQL endpoints in existing endpoint repositories, as well as a significant number of new SPARQL endpoints, have been discovered. Finally, we have developed a new SPARQL endpoint crawler (SpEC) for crawling and link analysis.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.