TL;DR
SPECTRA is a spectral graph generation method that improves molecular property regression for underrepresented target ranges by focusing on data scarcity and structural correspondence, achieving competitive results efficiently.
Contribution
It introduces a novel spectral, domain-aware graph generation approach combining rarity-aware budgeting, graph alignment, and spectral interpolation for better property prediction.
Findings
Outperforms state-of-the-art methods in relevant target ranges.
Requires approximately four times less computational time.
Effective in property prediction benchmarks.
Abstract
Molecular property regression struggles with cases in chemically relevant target ranges that are underrepresented in datasets. Standard average error minimization approaches underperform in these highly relevant cases, and oversampling approaches lead to meaningless molecular representations. In this paper, we propose SPECTRA, a spectral, domain-aware graph generation method designed to improve the prediction of underrepresented but relevant molecular property values. It combines a rarity-aware budgeting scheme to focus generation where data are scarce, target-neighbors graph alignment to establish structural correspondence, and interpolation of Laplacian spectra, node features, and targets. Coupled with spectral GNN using edge-aware Chebyshev convolutions, SPECTRA shows its effectiveness in property prediction benchmarks with competitive performance over leading state-of-the-art…
Peer Reviews
Decision·ICLR 2026 Conference Desk Rejected Submission
This work innovatively applies spectral graph augmentation to molecular property regression. By preserving the topological structure and chemical properties of molecular graphs while enhancing samples in sparsely populated regions of the label space, it overcomes the limitations of existing oversampling techniques in regression. The methodology is well designed, with a comprehensive workflow that includes spectral decomposition, FGW alignment, eigenvalue interpolation, and graph reconstruction.
The absence of statistical significance tests and experiments, such as quality analysis of the generated molecules, could be addressed by incorporating relevant experiments and analyses to enhance the persuasiveness of the findings. Regarding writing and presentation, the paper exhibits numerous minor errors in the formatting of figures and tables.
1. Using the concept of geometry-aware graph matching, a geometry-aware and feature-aware correspondence is established between nodes of two different molecular graphs. 2. An innovative spectral domain interpolation method is used to interpolate the eigenvalues and eigenvectors of the Laplacian matrix, as well as node features, ensuring sufficient validity of the generated molecules. 3. Kernel density estimation is used to analyze the distribution of target labels in the training set to iden
1. The paper doesn't seem to explicitly compare its performance with the mentioned "embedding-based enhancement methods." 2. Limited performance. The proposed method doesn't show a significant advantage over the baselines presented, especially for SGIR. 3. Scalability is questionable. To my knowledge, the FGW alignment and spectral decomposition used in the paper both have O(n^3) complexity. The three datasets used in the paper are all very small, raising doubts about whether the method can be
1. The proposed method is well motivated and clear. 2. The method seems to indeed expend chemical diversity and improving generalization in low-density target regions. 3. The authors provides convincing ablations studies for the method.
1. The method is very much domain chemistry-specific and was not shown to be generally useful outside of the specific datasets and settings discussed in this work, which makes me doubt whether ICLR is the best venue for this work. 2. There is no running time analysis, and I believe the FGW alignment and spectral decompositions makes the method very heavy computationally. It would be good if the authors would provide some analysis on that and comparison to other methods 3. The empirical part is
- The authors address a gap in the literature—imbalanced regression. This is a relevant and potentially impactful contribution, particularly for low-data regimes in chemistry. - The manuscript is clearly written and well organized. - The model is extensively ablated, and the results are presented with clarity in dedicated figures. While performance is generally solid, SPECTRA does underperform one baseline (SGIR).
- The related work overlooks the field of graph diffusion models, which have set the state of the art since DiGress (2022). - The motivation for the proposed approach is unconvincing: why is graph matching required prior to interpolation? - There is no guarantee that interpolation in graph space yields molecules with the desired properties in low-density regions. In other words, interpolating graphs does not necessarily correspond to meaningful interpolation in chemical space. It doesn't even
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Computational Drug Discovery Methods · Machine Learning in Materials Science
