HeGeL: A Novel Dataset for Geo-Location from Hebrew Text
Tzuf Paz-Argaman, Tal Bauman, Itai Mondshine, Itzhak Omer, Sagi, Dalyot, Reut Tsarfaty

TL;DR
This paper introduces HeGeL, a new Hebrew dataset for textual geolocation that emphasizes natural language understanding and geospatial reasoning, addressing the lack of resources for morphologically rich languages.
Contribution
The creation of the HeGeL corpus with 5,649 Hebrew place descriptions, enabling research on geolocation in resource-poor, morphologically rich languages.
Findings
Data shows extensive use of geospatial reasoning.
Requires novel environmental representation.
Highlights challenges in Hebrew geolocation.
Abstract
The task of textual geolocation - retrieving the coordinates of a place based on a free-form language description - calls for not only grounding but also natural language understanding and geospatial reasoning. Even though there are quite a few datasets in English used for geolocation, they are currently based on open-source data (Wikipedia and Twitter), where the location of the described place is mostly implicit, such that the location retrieval resolution is limited. Furthermore, there are no datasets available for addressing the problem of textual geolocation in morphologically rich and resource-poor languages, such as Hebrew. In this paper, we present the Hebrew Geo-Location (HeGeL) corpus, designed to collect literal place descriptions and analyze lingual geospatial reasoning. We crowdsourced 5,649 literal Hebrew place descriptions of various place types in three cities in Israel.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeographic Information Systems Studies · Natural Language Processing Techniques
