Assessing the quality of sources in Wikidata across languages: a hybrid approach
Gabriel Amaral, Alessandro Piscopo, Lucie-Aim\'ee Kaffee, Odinaldo, Rodrigues, Elena Simperl

TL;DR
This study evaluates the quality of references in Wikidata across multiple languages using crowdsourcing, statistical analysis, and machine learning to identify challenges and suggest improvements for ensuring data credibility.
Contribution
It introduces a hybrid approach combining crowdsourcing, statistics, and machine learning to assess reference quality in multilingual Wikidata at scale.
Findings
Identified common challenges in reference quality across languages
Developed machine learning models to evaluate reference credibility
Provided recommendations for improving reference quality practices
Abstract
Wikidata is one of the most important sources of structured data on the web, built by a worldwide community of volunteers. As a secondary source, its contents must be backed by credible references; this is particularly important as Wikidata explicitly encourages editors to add claims for which there is no broad consensus, as long as they are corroborated by references. Nevertheless, despite this essential link between content and references, Wikidata's ability to systematically assess and assure the quality of its references remains limited. To this end, we carry out a mixed-methods study to determine the relevance, ease of access, and authoritativeness of Wikidata references, at scale and in different languages, using online crowdsourcing, descriptive statistics, and machine learning. Building on previous work of ours, we run a series of microtasks experiments to evaluate a large…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Wikis in Education and Collaboration · Semantic Web and Ontologies
