Scholarly Wikidata: Population and Exploration of Conference Data in Wikidata using LLMs
Nandana Mihindukulasooriya, Sanju Tiwari, Daniil Dobriy, Finn {\AA}rup, Nielsen, Tek Raj Chhetri, Axel Polleres

TL;DR
This paper presents a method using large language models to automatically extract and populate scholarly conference data into Wikidata, enhancing its comprehensiveness and utility for the Semantic Web community.
Contribution
It introduces a semi-automated approach leveraging LLMs for extracting conference metadata and analyzes ontology gaps, significantly expanding Wikidata's scholarly data coverage.
Findings
Extended Wikidata with over 6000 new scholarly entities
Successfully extracted conference metadata with minimal manual validation
Method applicable beyond Semantic Web conferences
Abstract
Several initiatives have been undertaken to conceptually model the domain of scholarly data using ontologies and to create respective Knowledge Graphs. Yet, the full potential seems unleashed, as automated means for automatic population of said ontologies are lacking, and respective initiatives from the Semantic Web community are not necessarily connected: we propose to make scholarly data more sustainably accessible by leveraging Wikidata's infrastructure and automating its population in a sustainable manner through LLMs by tapping into unstructured sources like conference Web sites and proceedings texts as well as already existing structured conference datasets. While an initial analysis shows that Semantic Web conferences are only minimally represented in Wikidata, we argue that our methodology can help to populate, evolve and maintain scholarly data as a community within Wikidata.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWikis in Education and Collaboration · Digital Rights Management and Security · Natural Language Processing Techniques
