Gini in a Bottleneck: Sparse Molecular Representations for Graph Convolutional Neural Networks
Ryan Henderson, Djork-Arn\'e Clevert, Floriane Montanari

TL;DR
This paper introduces a Gini index-based constraint in graph convolutional neural networks to produce sparse, interpretable molecular representations without sacrificing predictive accuracy, validated on quantum chemistry and drug discovery datasets.
Contribution
It proposes a novel Gini index constraint to enhance interpretability of molecular graph neural networks while maintaining performance, demonstrated on QM9 and proprietary ADMET datasets.
Findings
Gini constraint does not degrade evaluation metrics.
Enables visually interpretable molecular representations.
Chemists' assessments align with model-identified regions.
Abstract
Due to the nature of deep learning approaches, it is inherently difficult to understand which aspects of a molecular graph drive the predictions of the network. As a mitigation strategy, we constrain certain weights in a multi-task graph convolutional neural network according to the Gini index to maximize the "inequality" of the learned representations. We show that this constraint does not degrade evaluation metrics for some targets, and allows us to combine the outputs of the graph convolutional operation in a visually interpretable way. We then perform a proof-of-concept experiment on quantum chemistry targets on the public QM9 dataset, and a larger experiment on ADMET targets on proprietary drug-like molecules. Since a benchmark of explainability in the latter case is difficult, we informally surveyed medicinal chemists within our organization to check for agreement between regions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Machine Learning in Materials Science · Protein Structure and Dynamics
