Linking Streets in OpenStreetMap to Persons in Wikidata
Daria Gurtovoy, Simon Gottschalk

TL;DR
This paper introduces StreetToPerson, a method for linking streets in OpenStreetMap to persons in Wikidata based on relations and spatial data, significantly improving connection accuracy and enabling large-scale mapping.
Contribution
The paper presents a novel approach for connecting OSM streets to Wikidata persons, outperforming existing methods by 26 percentage points and applying it to all streets in Germany.
Findings
Outperforms existing approaches by 26 percentage points
Identifies over 180,000 street-person links in Germany
Enables large-scale geographic-knowledge graph integration
Abstract
Geographic web sources such as OpenStreetMap (OSM) and knowledge graphs such as Wikidata are often unconnected. An example connection that can be established between these sources are links between streets in OSM to the persons in Wikidata they were named after. This paper presents StreetToPerson, an approach for connecting streets in OSM to persons in a knowledge graph based on relations in the knowledge graph and spatial dependencies. Our evaluation shows that we outperform existing approaches by 26 percentage points. In addition, we apply StreetToPerson on all OSM streets in Germany, for which we identify more than 180,000 links between streets and persons.
| Precision | Recall | F1 Score | ||
|---|---|---|---|---|
|
0.49 | 0.45 | 0.47 | |
| PopRank | 0.69 | 0.66 | 0.67 | |
|
0.08 | 0.08 | 0.08 | |
|
0.35 | 0.11 | 0.17 | |
| StreetToPerson | 0.95 | 0.91 | 0.93 |
| Number of | Bremen | NRW | Germany |
|---|---|---|---|
| Streets | |||
| with candidate persons | |||
| Candidate persons | |||
| Street-to-person relations |
| Bremen | NRW | |
|---|---|---|
| Precision | ||
| Recall | ||
| F1 Score |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Linking Streets in OpenStreetMap to Persons in Wikidata
Daria Gurtovoy
Universität BonnGermany
and
Simon Gottschalk
L3S Research Center, Leibniz Universität HannoverGermany
(2022)
Abstract.
Geographic web sources such as OpenStreetMap (OSM) and knowledge graphs such as Wikidata are often unconnected. An example connection that can be established between these sources are links between streets in OSM to the persons in Wikidata they were named after. This paper presents StreetToPerson, an approach for connecting streets in OSM to persons in a knowledge graph based on relations in the knowledge graph and spatial dependencies. Our evaluation shows that we outperform existing approaches by percentage points. In addition, we apply StreetToPerson on all OSM streets in Germany, for which we identify more than links between streets and persons.
Knowledge graphs, OpenStreetMap, Wikidata, Street names
††journalyear: 2022††conference: Companion Proceedings of the Web Conference 2022; April 25–29, 2022; Virtual Event, Lyon, France.††booktitle: Companion Proceedings of the Web Conference 2022 (WWW ’22 Companion), April 25–29, 2022, Virtual Event, Lyon, France††price: 15.00††isbn: 978-1-4503-9130-6/22/04††doi: 10.1145/3487553.3524267††ccs: Information systems Spatial-temporal systems††ccs: Information systems Information integration
1. Introduction
Streets are often named after famous or distinguished individuals who may or may not have a direct connection to the specific location. While geographic data sources such as OpenStreetMap (OSM)111OpenStreetMap, OSM and the OpenStreetMap magnifying glass logo are trademarks of the OpenStreetMap Foundation, and are used with their permission. We are not endorsed by or affiliated with the OpenStreetMap Foundation. contain information about streets across the globe and knowledge graphs such as Wikidata contain information about famous individuals and their relations, these two worlds are often not connected. This makes it challenging to connect streets to whom they are named after.
For example, consider the street named “Wilhelmstraße” in the German capital Berlin. After the removal of the suffix “straße” (German for “street”), only the term “Wilhelm” remains. This term can potentially be linked to a variety of persons, be it via their first name (e.g., Wilhelm Busch), their second name (e.g., Friedrich Wilhelm I.) or their last name (e.g., Paul Wilhelm). With the help of Wikidata, we can infer that Friedrich Wilhelm I. was born in Berlin and was a historically important monarch. Both factors may lead to the correct selection of Friedrich Wilhelm I. for the Wilhelmstraße.
The provision of links between streets in OSM to persons in Wikidata can support several use cases including: (i) Bridging the gap between OSM and knowledge graphs: By bringing OSM and Wikidata together, they will mutually benefit from their strengths (Tempelmeier and Demidova, 2021). Knowledge graphs such as WorldKG (Dsouza et al., 2021b) and the Nuremberg Address Knowledge Graph (Bruns et al., 2021) are examples of such efforts. (ii) Providing data for touristic applications: Tourists are often interested in exploring the history behind a place that can often be told by people who have a special significance in that place. An existing example is an application for studying the backgrounds behind the Stolpersteine in Germany, which are dedicated to victims of the Nazi regime222https://stolpersteine.wdr.de/web/en/. (iii) Enabling cultural analyses: With street-to-person links available, analyses are enabled about the motives behind naming streets, potentially revealing discrimination of women (Ouali et al., 2021) and the detection of quarters dedicated to specific groups of persons (Dias Almeida et al., 2016).
In contrast to named entity linking approaches on text, no context information is available additional to the street’s name and location. Therefore, the task of linking streets in OSM to persons in Wikidata lies in identifying the correct person based on features such as the person’s popularity and the person’s relatedness to the street’s location. We propose StreetToPerson, which first builds an index for retrieving potential candidate persons given a street name. For each such candidate person, we extract a set of features denoting specific characteristics of that person, such as its popularity and its spatial relatedness to the given street, and classify it as correct or not.
In our evaluation, we first show StreetToPerson’s superiority over existing baselines using a ground truth extracted from German streets in Wikidata, and we apply StreetToPerson to German streets in OSM. StreetToPerson reaches a precision of on the ground truth and identifies more than street-to-person relations.
The remainder of this paper is structured as follows: In Section 2, we present related work. Then, in Section 3, we describe our approach and evaluate it in Section 4. Finally, we provide a conclusion in Section 5.
2. Related Work
Linking streets to persons is related to named entity linking and connecting geographic data sources with knowledge graphs.
2.1. Named Person Entity Linking
Most related to this paper is the work by Almeida et al. (Dias Almeida et al., 2016), a ranking-based approach to connect streets to the persons they were named after based on a manually defined relevance score. Users then confirm the generated street-to-person pairs in a web interface. Geiss et al. (Geiß and Gertz, 2016) link mentions of persons in a text document to Wikidata using a network of candidate persons. In general, street-to-person linking can be considered a variation of named entity linking to Wikidata (Möller et al., 2022).
2.2. Connecting OSM with Knowledge Graphs
The task of connecting geographic data sources such as OSM to knowledge graphs has been addressed from different perspectives. Typically, approaches aim at establishing identity links between the different representations of geographic entities and concepts in these sources. For example, (Tempelmeier and Demidova, 2021) proposes a pipeline for link discovery between OSM, Wikidata and DBpedia based on OSM tags, (Dsouza et al., 2021a) aligns the schema between these sources using an adversarial classifier, and osm2rdf converts the whole OpenStreetMap data to RDF triples (Bast et al., 2021). In contrast to these approaches, our task deals with persons, a class of entities not present in OSM.
3. Approach
The goal of this paper is to link streets to the correct person whom they are named after. Formally, we develop a street-to-person mapping function as follows:
Definition 3.1.
Street-to-person mapping function. Given a street and a set of persons , create a street-to-person mapping function that identifies the person after whom the street is named.
Figure 1 illustrates the approach and its components. First, the name of the given street is truncated to potentially only contain a person’s name. This term is then used in a candidate generation step to retrieve a set of candidate persons out of a set of persons extracted from Wikidata. For each street-candidate-pair , we extract various features and use a street-to-person classifier that determines the candidate with the highest probability as the correct person for the given street, where the street can be of any type of street in any region. In the example given in Figure 1, the street ”Wilhelmstraße” is truncated to ”Wilhelm”, for which the preprocessed knowledge graph returns the candidates “Paul Wilhelm”, “Wilhelm Busch” and “Friedrich Wilhelm I.” and others. After extracting all candidate features, the classifier determines the latter to be the best candidate for the given street.
The implementation of this pipeline as well as the generated links are available on GitHub333https://github.com/d-gurtovoy/streetnameLinks.
3.1. Knowledge Graph Preprocessing
The data required to retrieve candidates and their features is taken from Wikidata. For efficient access to relevant information, we create the following data sets from Wikidata in a preprocessing step444To extract these data sets, we process the Wikidata dump using the qwikidata library (https://qwikidata.readthedocs.io/).:
- •
The ground truth for training and evaluation contains known streets in Germany that are named after a person. We provide details about this dataset in Section 4.1.
- •
The person index takes a term as input and returns the Wikidata IDs of all people that match it. This person index contains over 4 million names of over 9 million persons.
- •
The person occupation index contains all occupations of a person (e.g., monarch or writer).
- •
The person location index contains relevant locations such as a person’s birthplace.
- •
The spatial dependency database denotes containment relations between locations (e.g., Berlin is located in Germany).
Figure 2 shows relations of Friedrich Wilhelm I. retrieved from the person occupation index and the person location index.555Over time, Friedrich Wilhelm I. has been buried in different churches.
3.2. Street Name Truncation
When analysing a street name, it is essential to differentiate between the part that maps to a person and street affixes (i.e., prefixes and suffixes), such as “street”, “road”, and “avenue”. Given an extensive list of common street affixes, they can be removed from the street name so that only the person name remains, which can then be used to look up matches in the person index database. To achieve this, we gathered the streets of the ground truth and removed all names and aliases from the persons associated with them. The remaining parts of the street names are then manually split into prefixes and suffixes and corrected when necessary. The result is a set of 80 German suffixes and 34 prefixes which we make available666https://github.com/d-gurtovoy/streetnameLinks/tree/master/data/affixes. Using this set of affixes, the name of a street is truncated so that street affixes are removed and the part that potentially maps to a person name remains.
3.3. Candidate Retrieval
The truncated street name is used to query the person index, which returns the Wikidata IDs of matching candidates . For instance, the term “Wilhelm” would return the ID of “Friedrich Wilhelm I.” but also of “Wilhelm Busch” among others.
3.4. Feature Extraction
After truncating the street name and retrieving candidates, we extract characteristics of the person and the spatial relations between and the street that serve as features for training a binary classifier. The following features are extracted for every street-candidate-pair :
- •
Link count: The number of links pointing to in the German Wikipedia. If the candidate does not exist in Wikipedia but only in Wikidata, this feature value is set to [math].
- •
Name: These four binary features show which part of the person’s name is contained in the name of . It can be its full, first or last name or an alias.
- •
Occupations: From the person occupation index, we gather of the most common occupations of people that German streets were named after and introduce binary features to mark if the person held any of these positions.
- •
Spatial relations: We introduce five numerical features representing whether there is a spatial relation between and : “born”, “died”, “buried”, “educated at”, and “work location”.
The spatial relations feature values represent to which extent the street and the person’s related location are contained within each other. These feature values range from [math] to , with [math] meaning the given location is either unknown or lies outside of Germany and if the location is the street itself. If a person is related to multiple locations for the same relation, we take the location with the highest score.
Figure 3 illustrates an example of computing the containment of the street “Wilhelmstraße” and Berlin, the birthplace of Friedrich Wilhelm I. We create a containment chain for both locations from the spatial dependency database and check their overlap. In this example, the “born” relation feature is set to .
4. Evaluation
This section introduces the data used for training and testing and evaluates StreetToPerson against three baselines.
4.1. Data
We require a dataset of known person-to-street links to train our approach and evaluate. For a few cases, Wikidata and OSM provide this information. Wikidata contains German streets connected to a person via Wikidata’s “named after” property777https://www.wikidata.org/wiki/Property:P138. We use these street-to-person relations as positive examples for training our model. As negative examples, we retrieve up to candidate persons of these streets with the highest link count each.
OSM provides a tag called “name:etymology”888https://wiki.openstreetmap.org/wiki/Key:name:etymology, which links streets to persons in Wikipedia999We use Wikidata sitelinks to link persons in Wikipedia to Wikidata. This key provides street-to-person relations involving German streets.
4.2. Baselines
We compare StreetToPerson to three different baselines:
- •
TagMe (Ferragina and Scaiella, 2010) (TagMe): TagMe is a traditional entity linking approach on short text fragments. As it may link a street name to the actual street entity and not the person it is named after, we feed TagMe with the street name after applying street name truncation.
- •
Popularity Ranking (PopRank): In a simpler version of StreetToPerson, we replace the street-to-person classifier by taking the person with the highest link count.
- •
Relevance Ranking (Dias Almeida et al., 2016) (RelRank): The approach by Almeida et al. described in Section 2 is, to the best of our knowledge, the only existing approach for street-to-person linking. As RelRank does not limit its results to person entities, we consider two configurations: RelRank (all entities) and RelRank (person entities), where we only consider links to person entities.
4.3. Evaluation of the Classification
We evaluate StreetToPerson’s classification performance on the Wikidata ground truth described in Section 4.1 using -fold cross-validation. Results are shown in Table 1. StreetToPerson clearly outperforms the other approaches and reaches a precision of , percentage points more than the second-best baseline, PopRank. The low recall of RelRank can be explained by its limited number of German affixes (“straße” and “weg”), the low amount of features and the manually created relevance criterion.
4.4. Application on OpenStreetMap
In a second step of the evaluation, we demonstrate how StreetToPerson can be used for identifying street-to-person relations between OpenStreetMap and Wikidata that are not yet contained in these sources. To this end, we apply StreetToPerson to the whole of Germany and two selected German states – the most populated state, North Rhine-Westphalia (NRW), and the least populated state, Bremen. From OSM, we select all streets contained in these regions and then apply StreetToPerson as depicted in Figure 1. Table 2 shows the results: For more than half of the streets in NRW (Bremen: of streets), the person index returned at least one candidate person. After applying the street-to-person classifier, street-to-person relations were identified (Bremen: ). In the whole of Germany, more than street-to-person relations are identified. This table also emphasizes the difficulty of identifying the correct person candidate: For streets in Germany, more than million candidate persons are found. An example of a false classification is the street ”Grevesmühlweg” in Bremen which is wrongly assigned to Maria Grevesmühl but should be assigned to her father Hermann Grevesmühl – both musicians who were born and died in Bremen, thus highly similar.
Finally, we measure the correctness of the identified street-to-person pairs based on a subset of the streets in Table 2, which are assigned to a person in OSM through the name:etymology key. Table 3 shows the results on this subset of (Bremen) and streets (NRW). The results show a precision of for NRW and for Bremen which confirms the generalizability of our approach from Wikidata to OSM.
5. Conclusion and Future Work
In this paper, we have presented StreetToPerson, an approach for connecting streets in OpenStreetMap to those persons in Wikidata whom they were named after. Through a combination of knowledge graph features and spatial features, StreetToPerson precisely identifies new relations. In future work, we plan to enrich the feature space through the utilisation of graph-based embeddings and to extend our approach to further countries and other relations between geographic entities in OpenStreetMap and knowledge graph entities.
Acknowledgements.
This work was partially funded by the Federal Ministry of Education and Research (BMBF), Germany under “Simple-ML” (01IS18054) and the DFG, German Research Foundation, under “WorldKG” (424985896).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1)
- 2Bast et al . (2021) Hannah Bast, Patrick Brosi, Johannes Kalmbach, and Axel Lehmann. 2021. An Efficient RDF Converter and SPARQL Endpoint for the Complete Open Street Map Data. In The 29th International Conference on Advances in Geographic Information Systems . 536–539.
- 3Bruns et al . (2021) Oleksandra Bruns, Tabea Tietz, Mehdi Ben Chaabane, Manuel Portz, Felix Xiong, and Harald Sack. 2021. The Nuremberg Address Knowledge Graph. In Extended Semantic Web Conference (ESWC) (Lecture Notes in Computer Science, Vol. 12739) . Springer, 115–119.
- 4Dias Almeida et al . (2016) Paulo Dias Almeida, Jorge Rocha, Andrea Ballatore, and Alexander Zipf. 2016. Where the Streets Have Known Names. In International Conference on Computational Science and Its Applications (ICCSA ’16) , Vol. 9789. 1–12. https://doi.org/10.1007/978-3-319-42089-9_1 · doi ↗
- 5Dsouza et al . (2021 a) Alishiba Dsouza, Nicolas Tempelmeier, and Elena Demidova. 2021 a. Towards Neural Schema Alignment for Open Street Map and Knowledge Graphs. In International Semantic Web Conference (ISWC) . Springer, 56–73.
- 6Dsouza et al . (2021 b) Alishiba Dsouza, Nicolas Tempelmeier, Ran Yu, Simon Gottschalk, and Elena Demidova. 2021 b. World KG: A World-Scale Geographic Knowledge Graph. In Conference on Information and Knowledge Management (CIKM) . ACM, 4475–4484.
- 7Ferragina and Scaiella (2010) Paolo Ferragina and Ugo Scaiella. 2010. TAGME: On-the-fly Annotation of Short Text Fragments (by Wikipedia Entities). In Conference on Information and Knowledge Management (CIKM) . 1625–1628.
- 8Geiß and Gertz (2016) Johanna Geiß and Michael Gertz. 2016. With a Little Help from my Neighbors: Person Name Linking Using the Wikipedia Social Network. In International Conference on World Wide Web (WWW) . ACM, 985–990.
