SCISSOR: Mitigating Semantic Bias through Cluster-Aware Siamese Networks for Robust Classification
Shuo Yang, Bardh Prenkaj, Gjergji Kasneci

TL;DR
SCISSOR is a novel Siamese network approach that reduces semantic bias by discouraging spurious semantic clusters, significantly improving model robustness across vision and NLP benchmarks without data augmentation.
Contribution
Introduces SCISSOR, a cluster-aware Siamese network method that effectively mitigates semantic shortcut biases without requiring data augmentation or rewriting.
Findings
SCISSOR improves F1 scores by up to 7.7 points across benchmarks.
It enhances lightweight model performance by over 9%.
It redefines model robustness by addressing semantic biases.
Abstract
Shortcut learning undermines model generalization to out-of-distribution data. While the literature attributes shortcuts to biases in superficial features, we show that imbalances in the semantic distribution of sample embeddings induce spurious semantic correlations, compromising model robustness. To address this issue, we propose SCISSOR (Semantic Cluster Intervention for Suppressing ShORtcut), a Siamese network-based debiasing approach that remaps the semantic space by discouraging latent clusters exploited as shortcuts. Unlike prior data-debiasing approaches, SCISSOR eliminates the need for data augmentation and rewriting. We evaluate SCISSOR on 6 models across 4 benchmarks: Chest-XRay and Not-MNIST in computer vision, and GYAFC and Yelp in NLP tasks. Compared to several baselines, SCISSOR reports +5.3 absolute points in F1 score on GYAFC, +7.3 on Yelp, +7.7 on Chest-XRay, and +1 on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Adversarial Robustness in Machine Learning · Imbalanced Data Classification Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Dropout · Dropout · Softmax · Dense Connections · Layer Normalization · Linear Warmup With Linear Decay · BERT
