# DFS: A Dataset File System for Data Discovering Users

**Authors:** Yasith Jayawardana, Sampath Jayarathna

arXiv: 1905.13363 · 2020-04-07

## TL;DR

This paper introduces DFS, a standardized dataset file system, and DDU, a scalable cloud architecture, to improve data discovery, metadata management, and integration with data tools for efficient secondary data analysis.

## Contribution

It proposes DFS and DDU to standardize dataset metadata and enable scalable, semi-automated data discovery and recommendation in digital libraries.

## Key findings

- DFS standardizes dataset metadata representation.
- DDU enables scalable, semi-automated metadata generation.
- The approach facilitates automatic dataset aggregation and integration.

## Abstract

Many research questions can be answered quickly and efficiently using data already collected for previous research. This practice is called secondary data analysis (SDA), and has gained popularity due to lower costs and improved research efficiency. In this paper we propose DFS, a file system to standardize the metadata representation of datasets, and DDU, a scalable architecture based on DFS for semi-automated metadata generation and data recommendation on the cloud. We discuss how DFS and DDU lays groundwork for automatic dataset aggregation, how it integrates with existing data wrangling and machine learning tools, and explores their implications on datasets stored in digital libraries.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.13363/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1905.13363/full.md

## References

17 references — full list in the complete paper: https://tomesphere.com/paper/1905.13363/full.md

---
Source: https://tomesphere.com/paper/1905.13363