Weakly supervised cross-modal learning in high-content screening

Watkinson Gabriel; Cohen Ethan; Bourriez Nicolas; Bendidi; Ihab; Bollot Guillaume; Genovesio Auguste

arXiv:2311.04678·cs.CV·November 14, 2023·1 cites

Weakly supervised cross-modal learning in high-content screening

Watkinson Gabriel, Cohen Ethan, Bourriez Nicolas, Bendidi, Ihab, Bollot Guillaume, Genovesio Auguste

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel weakly supervised cross-modal learning method for high-content screening, improving representation quality and batch effect mitigation in drug discovery data, with a new preprocessing approach for large datasets.

Contribution

The paper presents EMM and IMM loss functions based on CLIP for cross-modal learning, and a dataset preprocessing method that significantly reduces storage needs while preserving data integrity.

Findings

01

Enhanced cross-modal retrieval performance

02

Reduced batch effects in high-content screening data

03

Dataset size decreased from 85Tb to 7Tb without losing key information

Abstract

With the surge in available data from various modalities, there is a growing need to bridge the gap between different data types. In this work, we introduce a novel approach to learn cross-modal representations between image data and molecular representations for drug discovery. We propose EMM and IMM, two innovative loss functions built on top of CLIP that leverage weak supervision and cross sites replicates in High-Content Screening. Evaluating our model against known baseline on cross-modal retrieval, we show that our proposed approach allows to learn better representations and mitigate batch effect. In addition, we also present a preprocessing method for the JUMP-CP dataset that effectively reduce the required space from 85Tb to a mere usable 7Tb size, still retaining all perturbations and most of the information content.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gwatkinson/jump_download
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCell Image Analysis Techniques · Image Processing Techniques and Applications · Computational Drug Discovery Methods

MethodsContrastive Language-Image Pre-training