# A Variability-Aware Design Approach to the Data Analysis Modeling   Process

**Authors:** Maria Cristina Vale Tavares, Paulo Alencar, Donald Cowan

arXiv: 1812.10176 · 2018-12-27

## TL;DR

This paper introduces a variability-aware design approach for the data analysis modeling phase, aiming to improve automation and flexibility in big data science projects by modeling inherent variability.

## Contribution

It proposes a framework that assesses variability in data analysis modeling, capturing it through feature models to enhance process automation and system flexibility.

## Key findings

- Framework captures variability in data analysis modeling
- Potential for increased automation in data analysis processes
- Enhances flexibility of data analysis system design

## Abstract

The massive amount of current data has led to many different forms of data analysis processes that aim to explore this data to uncover valuable insights. Methodologies to guide the development of big data science projects, including CRISP-DM and SEMMA, have been widely used in industry and academia. The data analysis modeling phase, which involves decisions on the most appropriate models to adopt, is at the core of these projects. However, from a software engineering perspective, the design and automation of activities performed in this phase are challenging. In this paper, we propose an approach to the data analysis modeling process which involves (i) the assessment of the variability inherent in the CRISP-DM data analysis modeling phase and the provision of feature models that represent this variability; (ii) the definition of a framework structural design that captures the identified variability; and (iii) evaluation of the developed framework design in terms of the possibilities for process automation. The proposed approach advances the state of the art by offering a variability-aware design solution that can enhance system flexibility, potentially leading to novel software frameworks which can significantly improve the level of automation in data analysis modeling process.

---
Source: https://tomesphere.com/paper/1812.10176