General Intelligence-based Fragmentation (GIF): A framework for peak-labeled spectra simulation
Margaret R. Martin, Soha Hassoun

TL;DR
This paper introduces GIF, a structured framework leveraging large language models for simulating mass spectra in metabolomics, improving annotation accuracy and enabling explainable reasoning.
Contribution
GIF provides a systematic prompting approach for LLMs to simulate spectra, outperforming existing models and deep learning baselines in metabolomics applications.
Findings
GPT-4o achieves cosine similarity of 0.36 with true spectra
GIF outperforms several deep learning baselines
GIF enables human-in-the-loop workflows and explainable reasoning
Abstract
Despite growing reference libraries and advanced computational tools, progress in the field of metabolomics remains constrained by low rates of annotating measured spectra. The recent developments of large language models (LLMs) have led to strong performance across a wide range of generation and reasoning tasks, spurring increased interest in LLMs' application to domain-specific scientific challenges, such as mass spectra annotation. Here, we present a novel framework, General Intelligence-based Fragmentation (GIF), that guides pretrained LLMs through spectra simulation using structured prompting and reasoning. GIF utilizes tagging, structured inputs/outputs, system prompts, instruction-based prompts, and iterative refinement. Indeed, GIF offers a structured alternative to ad hoc prompting, underscoring the need for systematic guidance of LLMs on complex scientific tasks. Using GIF, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Metabolomics and Mass Spectrometry Studies · Computational Drug Discovery Methods
