Data Lakes for Digital Humanities
J\'er\^ome Darmont (ERIC), C\'ecile Favre (ERIC), Sabine Loudcher, (ERIC), Camille No\^us

TL;DR
This paper advocates using data lakes to manage diverse data formats in Digital Humanities, highlighting ongoing projects and lessons learned to address data siloing and variety challenges.
Contribution
It introduces the application of data lakes in Digital Humanities, demonstrating their potential to handle complex, heterogeneous data sources effectively.
Findings
Data lakes help integrate diverse humanities data formats.
Collaborative projects reveal practical benefits and challenges.
Lessons learned inform future Digital Humanities data management.
Abstract
Traditional data in Digital Humanities projects bear various formats (structured, semi-structured, textual) and need substantial transformations (encoding and tagging, stemming, lemmatization, etc.) to be managed and analyzed. To fully master this process, we propose the use of data lakes as a solution to data siloing and big data variety problems. We describe data lake projects we currently run in close collaboration with researchers in humanities and social sciences and discuss the lessons learned running these projects.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
