Data Curation APIs
Seyed-Mehdi-Reza Beheshti, Alireza Tabebordbar, Boualem, Benatallah, Reza Nouri

TL;DR
This paper introduces a set of data curation APIs designed to facilitate transforming raw, unstructured data into structured, meaningful, and ready-to-use curated data, thereby enhancing data analysis and interpretation.
Contribution
The paper presents and shares open-source curation APIs that automate key data processing tasks, improving efficiency and accuracy in data transformation workflows.
Findings
APIs support extraction of keywords, entities, and synonyms.
APIs enable linking to external knowledge bases like Wikidata.
APIs assist in data classification, sorting, and indexing.
Abstract
Understanding and analyzing big data is firmly recognized as a powerful and strategic priority. For deeper interpretation of and better intelligence with big data, it is important to transform raw data (unstructured, semi-structured and structured data sources, e.g., text, video, image data sets) into curated data: contextualized data and knowledge that is maintained and made available for use by end-users and applications. In particular, data curation acts as the glue between raw data and analytics, providing an abstraction layer that relieves users from time consuming, tedious and error prone curation tasks. In this context, the data curation process becomes a vital analytics asset for increasing added value and insights. In this paper, we identify and implement a set of curation APIs and make them available (on GitHub) to researchers and developers to assist them transforming their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Semantic Web and Ontologies · Scientific Computing and Data Management
