Harmonizing Metadata of Language Resources for Enhanced Querying and Accessibility
Zixuan Liang

TL;DR
This paper presents a method for harmonizing metadata from various language resource repositories using linked data and RDF, enabling improved search, browsing, and querying through a new portal called Linghub, with promising initial results.
Contribution
It introduces a unified metadata model for language resources based on standards like DCAT and META-SHARE OWL, facilitating better interoperability and user access.
Findings
Linghub supports effective text-based search and faceted browsing.
User queries are largely satisfied despite some limitations.
Highlights the need for adherence to open standards for metadata harmonization.
Abstract
This paper addresses the harmonization of metadata from diverse repositories of language resources (LRs). Leveraging linked data and RDF techniques, we integrate data from multiple sources into a unified model based on DCAT and META-SHARE OWL ontology. Our methodology supports text-based search, faceted browsing, and advanced SPARQL queries through Linghub, a newly developed portal. Real user queries from the Corpora Mailing List (CML) were evaluated to assess Linghub capability to satisfy actual user needs. Results indicate that while some limitations persist, many user requests can be successfully addressed. The study highlights significant metadata issues and advocates for adherence to open vocabularies and standards to enhance metadata harmonization. This initial research underscores the importance of API-based access to LRs, promoting machine usability and data subset extraction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
