Harmonizing Metadata of Language Resources for Enhanced Querying and   Accessibility

Zixuan Liang

arXiv:2501.05606·cs.CL·January 13, 2025

Harmonizing Metadata of Language Resources for Enhanced Querying and Accessibility

Zixuan Liang

PDF

Open Access

TL;DR

This paper presents a method for harmonizing metadata from various language resource repositories using linked data and RDF, enabling improved search, browsing, and querying through a new portal called Linghub, with promising initial results.

Contribution

It introduces a unified metadata model for language resources based on standards like DCAT and META-SHARE OWL, facilitating better interoperability and user access.

Findings

01

Linghub supports effective text-based search and faceted browsing.

02

User queries are largely satisfied despite some limitations.

03

Highlights the need for adherence to open standards for metadata harmonization.

Abstract

This paper addresses the harmonization of metadata from diverse repositories of language resources (LRs). Leveraging linked data and RDF techniques, we integrate data from multiple sources into a unified model based on DCAT and META-SHARE OWL ontology. Our methodology supports text-based search, faceted browsing, and advanced SPARQL queries through Linghub, a newly developed portal. Real user queries from the Corpora Mailing List (CML) were evaluated to assess Linghub capability to satisfy actual user needs. Results indicate that while some limitations persist, many user requests can be successfully addressed. The study highlights significant metadata issues and advocates for adherence to open vocabularies and standards to enhance metadata harmonization. This initial research underscores the importance of API-based access to LRs, promoting machine usability and data subset extraction…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques