SHA-256 Infused Embedding-Driven Generative Modeling of High-Energy Molecules in Low-Data Regimes

Siddharth Verma; Alankar Alankar

arXiv:2510.25788·cs.LG·October 31, 2025

SHA-256 Infused Embedding-Driven Generative Modeling of High-Energy Molecules in Low-Data Regimes

Siddharth Verma, Alankar Alankar

PDF

TL;DR

This paper introduces a novel molecular generation framework combining SHA-256 embedded representations with LSTM and GNN models, enabling the discovery of high-energy molecules with high validity, novelty, and diversity in low-data regimes.

Contribution

It proposes a new embedding strategy integrating SHA-256 hashes into molecular generation, improving diversity and novelty without pretraining in high-energy molecule discovery.

Findings

01

Achieved 67.5% validity and 37.5% novelty in generated molecules.

02

Generated molecules showed a mean Tanimoto similarity of 0.214, indicating diversity.

03

Identified 37 new super explosives with high predicted detonation velocities.

Abstract

High-energy materials (HEMs) are critical for propulsion and defense domains, yet their discovery remains constrained by experimental data and restricted access to testing facilities. This work presents a novel approach toward high-energy molecules by combining Long Short-Term Memory (LSTM) networks for molecular generation and Attentive Graph Neural Networks (GNN) for property predictions. We propose a transformative embedding space construction strategy that integrates fixed SHA-256 embeddings with partially trainable representations. Unlike conventional regularization techniques, this changes the representational basis itself, reshaping the molecular input space before learning begins. Without recourse to pretraining, the generator achieves 67.5% validity and 37.5% novelty. The generated library exhibits a mean Tanimoto coefficient of 0.214 relative to training set signifying the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.