Data Commons
Ramanathan V. Guha, Prashanth Radhakrishnan, Bo Xu, Wei Sun, Carolyn, Au, Ajai Tirumali, Muhammad J. Amjad, Samantha Piekos, Natalie Diaz, Jennifer, Chen, Julia Wu, Prem Ramaswami, James Manyika

TL;DR
Data Commons is a distributed platform that standardizes, processes, and interconnects public datasets from various sources into a unified Knowledge Graph, enabling easier access, integration, and natural language querying for societal problem-solving.
Contribution
It introduces a scalable architecture for a distributed network of data sites using common schemas and APIs, facilitating data integration and natural language search capabilities.
Findings
Successful deployment of Data Commons architecture
Interoperability of data across multiple sources
Enabling natural language queries over integrated data
Abstract
Publicly available data from open sources (e.g., United States Census Bureau (Census), World Health Organization (WHO), Intergovernmental Panel on Climate Change (IPCC)) are vital resources for policy makers, students and researchers across different disciplines. Combining data from different sources requires the user to reconcile the differences in schemas, formats, assumptions, and more. This data wrangling is time consuming, tedious and needs to be repeated by every user of the data. Our goal with Data Commons (DC) is to help make public data accessible and useful to those who want to understand this data and use it to solve societal challenges and opportunities. We do the data processing and make the processed data widely available via standard schemas and Cloud APIs. Data Commons is a distributed network of sites that publish data in a common schema and interoperate using the Data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Semantic Web and Ontologies · Advanced Graph Neural Networks
