Towards a Virtual Data Centre for Classics
Tobias Blanke, Mark Hedges

TL;DR
This paper explores integrating diverse datasets in Classics, comparing relational models and full-text virtualization to improve data access, organization, and retrieval in humanities research environments.
Contribution
It introduces a novel approach using full-text indexes for data virtualization, enhancing flexibility and semantic handling in humanities data integration.
Findings
Relational models are limited for unstructured text datasets.
Full-text indexes enable flexible, researcher-driven data views.
Full-text virtualization improves semantic and retrieval capabilities.
Abstract
The paper presents some of our work on integrating datasets in Classics. We present the results of various projects we had in this domain. The conclusions from LaQuAT concerned limitations to the approach rather than solutions. The relational model followed by OGSA-DAI was more effective for resources that consist primarily of structured data (which we call data-centric) rather than for largely unstructured text (which we call text-centric), which makes up a significant component of the datasets we were using. This approach was, moreover, insufficiently flexible to deal with the semantic issues. The gMan project, on the other hand, addressed these problems by virtualizing data resources using full-text indexes, which can then be used to provide different views onto the collections and services that more closely match the sort of information organization and retrieval activities found in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Humanities and Scholarship
