A Datalake for Data-driven Social Science Research
Puneet Arya, Ojas Sahasrabudhe, Adwaiya Srivastav, Partha Pratim Das, Maya Ramanath

TL;DR
This paper introduces a specialized Datalake infrastructure designed to facilitate data-driven social science research by integrating diverse data, ensuring transparency, and supporting analysis, thereby democratizing access to advanced data science tools.
Contribution
The paper presents a novel Datalake system tailored for social science research, supporting diverse data ingestion, provenance tracking, access control, and analysis tools, with real-world applications.
Findings
Streamlines social science research processes.
Enhances transparency and reproducibility.
Democratizes access to data science tools.
Abstract
Social science research increasingly demands data-driven insights, yet researchers often face barriers such as lack of technical expertise, inconsistent data formats, and limited access to reliable datasets.Social science research increasingly demands data-driven insights, yet researchers often face barriers such as lack of technical expertise, inconsistent data formats, and limited access to reliable datasets. In this paper, we present a Datalake infrastructure tailored to the needs of interdisciplinary social science research. Our system supports ingestion and integration of diverse data types, automatic provenance and version tracking, role-based access control, and built-in tools for visualization and analysis. We demonstrate the utility of our Datalake using real-world use cases spanning governance, health, and education. A detailed walkthrough of one such use case -- analyzing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Research Data Management Practices · Computational and Text Analysis Methods
