Integrating Transformer and Autoencoder Techniques with Spectral Graph Algorithms for the Prediction of Scarcely Labeled Molecular Data
Nicole Hayes, Ekaterina Merkurjev, Guo-Wei Wei

TL;DR
This paper introduces innovative graph-based models combining transformer, autoencoder, and spectral graph algorithms to improve property prediction in molecular data with scarce labels, demonstrating high accuracy with minimal labeled data.
Contribution
It presents novel integration of MBO techniques with transformer and autoencoder models for scarcely-labeled molecular data prediction, validated on multiple benchmarks.
Findings
Models perform well with as little as 1% labeled data.
Proposed methods outperform traditional classifiers like SVM and random forests.
Extensive experiments confirm the effectiveness of the integrated approach.
Abstract
In molecular and biological sciences, experiments are expensive, time-consuming, and often subject to ethical constraints. Consequently, one often faces the challenging task of predicting desirable properties from small data sets or scarcely-labeled data sets. Although transfer learning can be advantageous, it requires the existence of a related large data set. This work introduces three graph-based models incorporating Merriman-Bence-Osher (MBO) techniques to tackle this challenge. Specifically, graph-based modifications of the MBO scheme are integrated with state-of-the-art techniques, including a home-made transformer and an autoencoder, in order to deal with scarcely-labeled data sets. In addition, a consensus technique is detailed. The proposed models are validated using five benchmark data sets. We also provide a thorough comparison to other competing methods, such as support…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Machine Learning in Bioinformatics · Metabolomics and Mass Spectrometry Studies
