Retrieval of Experiments with Sequential Dirichlet Process Mixtures in Model Space
Ritabrata Dutta, Sohan Seth, Samuel Kaski

TL;DR
This paper introduces a sequential Dirichlet process mixture model for retrieving relevant experiments by comparing models learned from data, enabling lifelong learning and privacy preservation in scientific databases.
Contribution
It proposes a novel supermodel approach using particle-learning-based DPM for sequentially learning experiment models without storing raw data.
Findings
Effective retrieval demonstrated on simulated data.
Successful application to molecular biology experiments.
Model adapts sequentially without storing raw measurements.
Abstract
We address the problem of retrieving relevant experiments given a query experiment, motivated by the public databases of datasets in molecular biology and other experimental sciences, and the need of scientists to relate to earlier work on the level of actual measurement data. Since experiments are inherently noisy and databases ever accumulating, we argue that a retrieval engine should possess two particular characteristics. First, it should compare models learnt from the experiments rather than the raw measurements themselves: this allows incorporating experiment-specific prior knowledge to suppress noise effects and focus on what is important. Second, it should be updated sequentially from newly published experiments, without explicitly storing either the measurements or the models, which is critical for saving storage space and protecting data privacy: this promotes life long…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Gaussian Processes and Bayesian Inference · Machine Learning and Algorithms
