TRAPDOOR: Repurposing backdoors to detect dataset bias in machine learning-based genomic analysis
Esha Sarkar, Michail Maniatakos

TL;DR
TRAPDOOR is a novel method that repurposes neural network backdoors to detect and quantify dataset bias in genomic machine learning applications, ensuring fairer and more accurate predictions.
Contribution
This work introduces TRAPDOOR, a new approach that leverages backdoor techniques to identify and measure bias in genomic datasets without affecting model performance.
Findings
Detects dataset bias with 100% accuracy
Accurately estimates the extent of bias with minimal error
Effective on real-world cancer genomic data
Abstract
Machine Learning (ML) has achieved unprecedented performance in several applications including image, speech, text, and data analysis. Use of ML to understand underlying patterns in gene mutations (genomics) has far-reaching results, not only in overcoming diagnostic pitfalls, but also in designing treatments for life-threatening diseases like cancer. Success and sustainability of ML algorithms depends on the quality and diversity of data collected and used for training. Under-representation of groups (ethnic groups, gender groups, etc.) in such a dataset can lead to inaccurate predictions for certain groups, which can further exacerbate systemic discrimination issues. In this work, we propose TRAPDOOR, a methodology for identification of biased datasets by repurposing a technique that has been mostly proposed for nefarious purposes: Neural network backdoors. We consider a typical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Explainable Artificial Intelligence (XAI) · AI in cancer detection
