Invariant Risk Minimisation for Cross-Organism Inference: Substituting Mouse Data for Human Data in Human Risk Factor Discovery
Odhran O'Donoghue, Paul Duckworth, Giuseppe Ughi, Linus Scheibenreif,, Kia Khezeli, Adrienne Hoarfrost, Samuel Budd, Patrick Foley, Nicholas Chia,, John Kalantari, Graham Mackintosh, Frank Soboczenski, Lauren Sanders

TL;DR
This paper applies Invariant Risk Minimisation to integrate human, mouse, and in-vitro data for gene discovery in cancer, demonstrating potential for cross-organism inference despite data validity challenges.
Contribution
It introduces a novel IRM-based approach for cross-organism data integration and provides new homologue gene-matched datasets for the research community.
Findings
IRM identifies invariant features across species.
Partial consistency observed between human and mouse data.
Enhanced datasets are publicly available for further research.
Abstract
Human medical data can be challenging to obtain due to data privacy concerns, difficulties conducting certain types of experiments, or prohibitive associated costs. In many settings, data from animal models or in-vitro cell lines are available to help augment our understanding of human data. However, this data is known for having low etiological validity in comparison to human data. In this work, we augment small human medical datasets with in-vitro data and animal models. We use Invariant Risk Minimisation (IRM) to elucidate invariant features by considering cross-organism data as belonging to different data-generating environments. Our models identify genes of relevance to human cancer development. We observe a degree of consistency between varying the amounts of human and mouse data used, however, further work is required to obtain conclusive insights. As a secondary contribution, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Metabolomics and Mass Spectrometry Studies · Bioinformatics and Genomic Networks
