# Datenqualit\"at in Regressionsproblemen

**Authors:** Wolfgang Doneit, Ralf Mikut, Markus Reischl

arXiv: 1701.04342 · 2017-01-17

## TL;DR

This paper introduces criteria to quantify and evaluate typical data phenomena in regression datasets collected without experimental design, highlighting their impact on model reliability through simulated benchmarks.

## Contribution

It proposes new criteria for assessing data phenomena in regression problems and demonstrates their effectiveness using simulated benchmark datasets.

## Key findings

- Criteria effectively quantify data phenomena.
- Data distribution influences regression reliability.
- Simulated benchmarks validate the criteria.

## Abstract

Regression models are increasingly built using datasets which do not follow a design of experiment. Instead, the data is e.g. gathered by an automated monitoring of a technical system. As a consequence, already the input data represents phenomena of the system and violates statistical assumptions of distributions. The input data can show correlations, clusters or other patterns. Further, the distribution of input data influences the reliability of regression models. We propose criteria to quantify typical phenomena of input data for regression and show their suitability with simulated benchmark datasets.   -----   Regressionen werden zunehmend auf Datens\"atzen angewendet, deren Eingangsvektoren nicht durch eine statistische Versuchsplanung festgelegt wurden. Stattdessen werden die Daten beispielsweise durch die passive Beobachtung technischer Systeme gesammelt. Damit bilden bereits die Eingangsdaten Ph\"anomene des Systems ab und widersprechen statistischen Verteilungsannahmen. Die Verteilung der Eingangsdaten hat Einfluss auf die Zuverl\"assigkeit eines Regressionsmodells. Wir stellen deshalb Bewertungskriterien f\"ur einige typische Ph\"anomene in Eingangsdaten von Regressionen vor und zeigen ihre Funktionalit\"at anhand simulierter Benchmarkdatens\"atze.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1701.04342/full.md

## Figures

16 figures with captions in the complete paper: https://tomesphere.com/paper/1701.04342/full.md

## References

17 references — full list in the complete paper: https://tomesphere.com/paper/1701.04342/full.md

---
Source: https://tomesphere.com/paper/1701.04342