Consistency-guided semi-supervised outlier detection in heterogeneous data using fuzzy rough sets
Baiyang Chen, Zhong Yuan, Dezhong Peng, Xiaoliang Chen, Hongmei Chen

TL;DR
This paper introduces a semi-supervised outlier detection method for heterogeneous data using fuzzy rough sets, leveraging partial labels and consistency measures to improve detection accuracy.
Contribution
It proposes a novel consistency-guided outlier detection algorithm that effectively handles heterogeneous data with fuzzy rough set theory in a semi-supervised framework.
Findings
Outperforms or matches leading outlier detectors on 15 datasets.
Effectively utilizes partial labels and fuzzy similarity relations.
Demonstrates robustness across diverse heterogeneous data.
Abstract
Outlier detection aims to find samples that behave differently from the majority of the data. Semi-supervised detection methods can utilize the supervision of partial labels, thus reducing false positive rates. However, most of the current semi-supervised methods focus on numerical data and neglect the heterogeneity of data information. In this paper, we propose a consistency-guided outlier detection algorithm (COD) for heterogeneous data with the fuzzy rough set theory in a semi-supervised manner. First, a few labeled outliers are leveraged to construct label-informed fuzzy similarity relations. Next, the consistency of the fuzzy decision system is introduced to evaluate attributes' contributions to knowledge classification. Subsequently, we define the outlier factor based on the fuzzy similarity class and predict outliers by integrating the classification consistency and the outlier…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRough Sets and Fuzzy Logic · Anomaly Detection Techniques and Applications · Imbalanced Data Classification Techniques
