Fair Anomaly Detection For Imbalanced Groups
Ziwei Wu, Lecheng Zheng, Yuancheng Yu, Ruizhong Qiu, John, Birge, Jingrui He

TL;DR
This paper introduces FairAD, a novel fairness-aware anomaly detection method designed for imbalanced group scenarios, combining contrastive learning and rebalancing autoencoders, with theoretical fairness guarantees and validated on real datasets.
Contribution
The paper proposes FairAD, a new anomaly detection approach that ensures fairness in imbalanced group settings through contrastive learning and autoencoders, with proven theoretical fairness guarantees.
Findings
FairAD outperforms existing methods on real-world datasets.
Theoretical analysis confirms fairness guarantees of the contrastive learning component.
Empirical results show improved detection accuracy for protected groups.
Abstract
Anomaly detection (AD) has been widely studied for decades in many real-world applications, including fraud detection in finance, and intrusion detection for cybersecurity, etc. Due to the imbalanced nature between protected and unprotected groups and the imbalanced distributions of normal examples and anomalies, the learning objectives of most existing anomaly detection methods tend to solely concentrate on the dominating unprotected group. Thus, it has been recognized by many researchers about the significance of ensuring model fairness in anomaly detection. However, the existing fair anomaly detection methods tend to erroneously label most normal examples from the protected group as anomalies in the imbalanced scenario where the unprotected group is more abundant than the protected group. This phenomenon is caused by the improper design of learning objectives, which statistically…
Peer Reviews
Decision·Submitted to ICLR 2025
1. The issue of fairness in anomaly detection is a highly important and meaningful problem. 2. The paper provides theoretical guarantees for the proposed method. 3. The method's effectiveness is empirically validated on real datasets.
1. The paper presents two challenges: C1, handling imbalanced data, and C2, mitigating representation disparity. There is a strong coupling between these two challenges, as imbalanced data leads to representation disparity (i.e., group imbalance results in higher errors for protected groups, causing misclassifications). From this perspective, addressing C1 effectively resolves C2, which makes me question the necessity of the fairness-aware contrastive learning module. 2. The paper lacks novelty.
1. Rebalancing autoencoder with learnable weight for reconstruction loss is a simple way to encourage learning patterns from minority groups. I like the analytical calculation of \epsilon. 2. An elegant extension of contrasive entropy for uniform representation to incorporate fairness criterion as contrastive entropy across majority and minority groups. 3. Paper is easy to read and follow.
1. In most applications we have multi-valued multiple protected attributes. It seems that it is non-trivial to extend the loss function to cater to such datasets. How does the method scale with multi valued multiple protected attribute setup? 2. The choice of \alpha hyperparameter is arbitrary. Why 4 works and not 8? How does one choose the value of this hyperparameter in real world scenario? 3. The paper completely ignores the discussion on hyperparameter settings for the competitors. To ensure
1. The proposed technique shows that if it is known what the underrepresented groups are, then we can possibly use that information to improve anomaly detection through fairly straight-forward ways 2. The ablation experiments help to understand the importance of individual components of the loss function ============ Update after author rebuttals: Revising my scores after going over other reviewer's comments and being satisfied with most of author responses to my comments.
Overall, the paper needs to be written more clearly and unambiguously. Main comments are: 1. Section 2: The paper defines protected groups on the basis of a single feature -- this might be rather simplistic. Instead, in real data, there could be a combination of one or more features that defines the protected/unprotected classification. In reality, it might not be easy to identify the underrepresented groups automatically. 2. Why restrict to a single protected group? There might be multiple
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Anomaly Detection Techniques and Applications · Network Security and Intrusion Detection
MethodsFocus · Contrastive Learning
