Big Data Dimensional Analysis

Vijay Gadepally; Jeremy Kepner

arXiv:1408.0517·cs.DB·August 1, 2016

Big Data Dimensional Analysis

Vijay Gadepally, Jeremy Kepner

PDF

TL;DR

This paper introduces Dimensional Data Analysis (DDA), a technique for quickly understanding large datasets' structure and anomalies, leveraging existing schemas with minimal overhead for big data systems.

Contribution

The paper presents DDA, a novel method that efficiently analyzes big data structures and anomalies using existing schemas, reducing human effort and computational overhead.

Findings

01

DDA effectively identifies data structure and anomalies.

02

DDA has low overhead and integrates with existing systems.

03

Performance measurements show DDA's efficiency on various datasets.

Abstract

The ability to collect and analyze large amounts of data is a growing problem within the scientific community. The growing gap between data and users calls for innovative tools that address the challenges faced by big data volume, velocity and variety. One of the main challenges associated with big data variety is automatically understanding the underlying structures and patterns of the data. Such an understanding is required as a pre-requisite to the application of advanced analytics to the data. Further, big data sets often contain anomalies and errors that are difficult to know a priori. Current approaches to understanding data structure are drawn from the traditional database ontology design. These approaches are effective, but often require too much human involvement to be effective for the volume, velocity and variety of data encountered by big data systems. Dimensional Data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.