Attribute Graphs Underlying Molecular Generative Models: Path to   Learning with Limited Data

Samuel C. Hoffman; Payel Das; Karthikeyan Shanmugam; Kahini Wadhawan,; Prasanna Sattigeri

arXiv:2207.07174·cs.LG·September 2, 2024·1 cites

Attribute Graphs Underlying Molecular Generative Models: Path to Learning with Limited Data

Samuel C. Hoffman, Payel Das, Karthikeyan Shanmugam, Kahini Wadhawan,, Prasanna Sattigeri

PDF

Open Access

TL;DR

This paper introduces a simple perturbation-based algorithm to uncover attribute graphs underlying molecular generative models, enabling robust property prediction under limited data and distribution shifts.

Contribution

It presents a novel method to infer attribute graphs from pre-trained autoencoders, revealing how latent variables influence multiple attributes and improving transferability in molecular property prediction.

Findings

01

The attribute graph effectively models relationships between latent codes and molecular attributes.

02

Predictors based on the derived Markov blanket are robust to distribution shifts.

03

The method outperforms existing causal discovery and feature selection techniques in limited data scenarios.

Abstract

Training generative models that capture rich semantics of the data and interpreting the latent representations encoded by such models are very important problems in un-/self-supervised learning. In this work, we provide a simple algorithm that relies on perturbation experiments on latent codes of a pre-trained generative autoencoder to uncover an attribute graph that is implied by the generative model. We perform perturbation experiments to check for influence of a given latent variable on a subset of attributes. Given this, we show that one can fit an effective graphical model that models a structural equation model between latent codes taken as exogenous variables and attributes taken as observed variables. One interesting aspect is that a single latent variable controls multiple overlapping subsets of attributes unlike conventional approaches that try to impose full independence.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Bioinformatics · Fractal and DNA sequence analysis · Machine Learning and Algorithms