A Secure Data Enclave and Analytics Platform for Social Scientists
Yadu N. Babuji, Kyle Chard, Aaron Gerow, Eamon Duede

TL;DR
CLOUD KOTTA is a cloud-based platform that securely manages and analyzes sensitive social science data, providing scalable, cost-effective infrastructure for researchers to perform complex analyses without infrastructure management burdens.
Contribution
It introduces CLOUD KOTTA, a novel cloud architecture that automates secure, scalable, and cost-efficient data storage and analysis specifically tailored for social science research needs.
Findings
Manages approximately 10TB of data in production.
Processed over 5TB of data with 75,000 CPU hours.
Supports diverse workflows including text analysis and machine learning.
Abstract
Data-driven research is increasingly ubiquitous and data itself is a defining asset for researchers, particularly in the computational social sciences and humanities. Entire careers and research communities are built around valuable, proprietary or sensitive datasets. However, many existing computation resources fail to support secure and cost-effective storage of data while also enabling secure and flexible analysis of the data. To address these needs we present CLOUD KOTTA, a cloud-based architecture for the secure management and analysis of social science data. CLOUD KOTTA leverages reliable, secure, and scalable cloud resources to deliver capabilities to users, and removes the need for users to manage complicated infrastructure. CLOUD KOTTA implements automated, cost-aware models for efficiently provisioning tiered storage and automatically scaled compute resources. CLOUD KOTTA has…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Data Quality and Management · Advanced Database Systems and Queries
