PyTDC: A multimodal machine learning training, evaluation, and inference platform for biomedical foundation models
Alejandro Velez-Arce, Jesus Caraballo, Marinka Zitnik

TL;DR
PyTDC is an open-source platform that streamlines training, evaluation, and inference for multimodal biomedical AI models, enabling advanced research in drug discovery and biological data integration.
Contribution
It introduces the first comprehensive platform for multimodal biomedical AI, unifying heterogeneous data sources and benchmarking, and presents a novel case study on single-cell drug-target nomination.
Findings
State-of-the-art methods perform poorly on the case study.
A context-aware geometric deep learning method outperforms existing methods.
The model cannot generalize to unseen cell types or incorporate new modalities.
Abstract
Existing biomedical benchmarks do not provide end-to-end infrastructure for training, evaluation, and inference of models that integrate multimodal biological data and a broad range of machine learning tasks in therapeutics. We present PyTDC, an open-source machine-learning platform providing streamlined training, evaluation, and inference software for multimodal biological AI models. PyTDC unifies distributed, heterogeneous, continuously updated data sources and model weights and standardizes benchmarking and inference endpoints. This paper discusses the components of PyTDC's architecture and, to our knowledge, the first-of-its-kind case study on the introduced single-cell drug-target nomination ML task. We find state-of-the-art methods in graph representation learning and domain-specific methods from graph theory perform poorly on this task. Though we find a context-aware geometric…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSingle-cell and spatial transcriptomics · Cell Image Analysis Techniques · Bioinformatics and Genomic Networks
