Modelling-based experiment retrieval: A case study with gene expression   clustering

Paul Blomstedt; Ritabrata Dutta; Sohan Seth; Alvis Brazma; Samuel; Kaski

arXiv:1505.05007·stat.ML·January 8, 2016

Modelling-based experiment retrieval: A case study with gene expression clustering

Paul Blomstedt, Ritabrata Dutta, Sohan Seth, Alvis Brazma, Samuel, Kaski

PDF

TL;DR

This paper introduces a scalable, probabilistic model-based method for retrieving gene expression experiments by clustering and denoising data, improving relevance over traditional profile-based approaches.

Contribution

It proposes a novel retrieval framework using denoised models and clustering, enhancing accuracy and scalability for gene expression experiment retrieval.

Findings

01

Denoising improves retrieval accuracy in noisy datasets.

02

Heuristic clustering approximates full probabilistic inference effectively.

03

Method is scalable and compatible with standard software tools.

Abstract

Motivation: Public and private repositories of experimental data are growing to sizes that require dedicated methods for finding relevant data. To improve on the state of the art of keyword searches from annotations, methods for content-based retrieval have been proposed. In the context of gene expression experiments, most methods retrieve gene expression profiles, requiring each experiment to be expressed as a single profile, typically of case vs. control. A more general, recently suggested alternative is to retrieve experiments whose models are good for modelling the query dataset. However, for very noisy and high-dimensional query data, this retrieval criterion turns out to be very noisy as well. Results: We propose doing retrieval using a denoised model of the query dataset, instead of the original noisy dataset itself. To this end, we introduce a general probabilistic framework,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.