Comparing Shape-Constrained Regression Algorithms for Data Validation

Florian Bachinger; Gabriel Kronberger

arXiv:2209.09602·cs.LG·March 10, 2023

Comparing Shape-Constrained Regression Algorithms for Data Validation

Florian Bachinger, Gabriel Kronberger

PDF

Open Access

TL;DR

This paper compares various shape-constrained regression algorithms to evaluate their effectiveness and efficiency in automated data validation, leveraging domain knowledge expressed as constraints.

Contribution

It provides a comparative analysis of shape-constrained regression methods for data validation, focusing on accuracy and runtime performance.

Findings

01

Shape-constrained regression algorithms vary in classification accuracy.

02

Runtime performance differs significantly among algorithms.

03

Certain algorithms outperform others in specific data validation scenarios.

Abstract

Industrial and scientific applications handle large volumes of data that render manual validation by humans infeasible. Therefore, we require automated data validation approaches that are able to consider the prior knowledge of domain experts to produce dependable, trustworthy assessments of data quality. Prior knowledge is often available as rules that describe interactions of inputs with regard to the target e.g. the target must be monotonically decreasing and convex over increasing input values. Domain experts are able to validate multiple such interactions at a glance. However, existing rule-based data validation approaches are unable to consider these constraints. In this work, we compare different shape-constrained regression algorithms for the purpose of data validation based on their classification accuracy and runtime performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Gene expression and cancer classification · Neural Networks and Applications