Integer Programming Relaxations for Integrated Clustering and Outlier Detection
Lionel Ott, Linsey Pang, Fabio Ramos, David Howe, Sanjay Chawla

TL;DR
This paper introduces scalable integer programming relaxations for combined clustering and outlier detection, improving interpretability and robustness in large datasets.
Contribution
It proposes three novel relaxations of the integer program for integrated clustering and outlier detection, enhancing scalability and interpretability.
Findings
The methods effectively identify clusters and outliers in large datasets.
The relaxations outperform traditional methods in scalability and robustness.
Evaluation demonstrates the approaches' effectiveness on synthetic and real data.
Abstract
In this paper we present methods for exemplar based clustering with outlier selection based on the facility location formulation. Given a distance function and the number of outliers to be found, the methods automatically determine the number of clusters and outliers. We formulate the problem as an integer program to which we present relaxations that allow for solutions that scale to large data sets. The advantages of combining clustering and outlier selection include: (i) the resulting clusters tend to be compact and semantically coherent (ii) the clusters are more robust against data perturbations and (iii) the outliers are contextualised by the clusters and more interpretable, i.e. it is easier to distinguish between outliers which are the result of data errors from those that may be indicative of a new pattern emergent in the data. We present and contrast three relaxations to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Advanced Statistical Methods and Models · Advanced Statistical Process Monitoring
