AutoFAIR : Automatic Data FAIRification via Machine Reading
Tingyan Ma, Wei Liu, Bin Lu, Xiaoying Gan, Yunqiang Zhu, Luoyi Fu,, Chenghu Zhou

TL;DR
AutoFAIR automates the process of making data compliant with FAIR principles using machine reading and semantic matching, significantly improving data findability, accessibility, interoperability, and reusability.
Contribution
This paper introduces AutoFAIR, a novel architecture that automates data FAIRification leveraging language models and ontology guidance, addressing the inefficiencies of manual FAIRification.
Findings
AutoFAIR improves FAIRness scores across various datasets.
Significant enhancement in data findability and reusability.
Effective automatic extraction and alignment of metadata.
Abstract
The explosive growth of data fuels data-driven research, facilitating progress across diverse domains. The FAIR principles emerge as a guiding standard, aiming to enhance the findability, accessibility, interoperability, and reusability of data. However, current efforts primarily focus on manual data FAIRification, which can only handle targeted data and lack efficiency. To address this issue, we propose AutoFAIR, an architecture designed to enhance data FAIRness automately. Firstly, We align each data and metadata operation with specific FAIR indicators to guide machine-executable actions. Then, We utilize Web Reader to automatically extract metadata based on language models, even in the absence of structured data webpage schemas. Subsequently, FAIR Alignment is employed to make metadata comply with FAIR principles by ontology guidance and semantic matching. Finally, by applying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsResearch Data Management Practices · Scientific Computing and Data Management · Data Quality and Management
MethodsOntology · Focus · ALIGN
