TRAPDOOR: Repurposing backdoors to detect dataset bias in machine   learning-based genomic analysis

Esha Sarkar; Michail Maniatakos

arXiv:2108.10132·cs.LG·October 22, 2021

TRAPDOOR: Repurposing backdoors to detect dataset bias in machine learning-based genomic analysis

Esha Sarkar, Michail Maniatakos

PDF

Open Access

TL;DR

TRAPDOOR is a novel method that repurposes neural network backdoors to detect and quantify dataset bias in genomic machine learning applications, ensuring fairer and more accurate predictions.

Contribution

This work introduces TRAPDOOR, a new approach that leverages backdoor techniques to identify and measure bias in genomic datasets without affecting model performance.

Findings

01

Detects dataset bias with 100% accuracy

02

Accurately estimates the extent of bias with minimal error

03

Effective on real-world cancer genomic data

Abstract

Machine Learning (ML) has achieved unprecedented performance in several applications including image, speech, text, and data analysis. Use of ML to understand underlying patterns in gene mutations (genomics) has far-reaching results, not only in overcoming diagnostic pitfalls, but also in designing treatments for life-threatening diseases like cancer. Success and sustainability of ML algorithms depends on the quality and diversity of data collected and used for training. Under-representation of groups (ethnic groups, gender groups, etc.) in such a dataset can lead to inaccurate predictions for certain groups, which can further exacerbate systemic discrimination issues. In this work, we propose TRAPDOOR, a methodology for identification of biased datasets by repurposing a technique that has been mostly proposed for nefarious purposes: Neural network backdoors. We consider a typical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Explainable Artificial Intelligence (XAI) · AI in cancer detection