Data Depth as a Risk

Arturo Castellanos; Pavlo Mozharovskyi

arXiv:2507.08518·stat.ML·July 14, 2025

Data Depth as a Risk

Arturo Castellanos, Pavlo Mozharovskyi

PDF

TL;DR

This paper introduces a novel family of data depths called 'loss depths' that interpret data centrality as classifier risk, enabling efficient high-dimensional anomaly detection and connecting data geometry with classifier complexity.

Contribution

It extends traditional data depth by framing it as classifier risk, allowing the use of machine learning algorithms and facilitating high-dimensional data analysis.

Findings

01

Loss depths can be computed efficiently using existing classifiers.

02

They perform well in anomaly detection tasks.

03

The framework connects data centrality with classifier complexity.

Abstract

Data depths are score functions that quantify in an unsupervised fashion how central is a point inside a distribution, with numerous applications such as anomaly detection, multivariate or functional data analysis, arising across various fields. The halfspace depth was the first depth to aim at generalising the notion of quantile beyond the univariate case. Among the existing variety of depth definitions, it remains one of the most used notions of data depth. Taking a different angle from the quantile point of view, we show that the halfspace depth can also be regarded as the minimum loss of a set of classifiers for a specific labelling of the points. By changing the loss or the set of classifiers considered, this new angle naturally leads to a family of "loss depths", extending to well-studied classifiers such as, e.g., SVM or logistic regression, among others. This framework directly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.