Full-Spectrum Machine Learning Diagnostics for Interstellar PAHs
Zhao Wang

TL;DR
This paper presents a machine learning approach using a random forest classifier to analyze full IR spectra of interstellar PAHs, achieving high accuracy in identifying molecular size and charge states, and revealing charge-dependent spectral diagnostics.
Contribution
It introduces a novel AI paradigm that treats entire IR spectra as high-dimensional fingerprints, surpassing traditional empirical methods for PAH diagnostics.
Findings
Achieved F1-score of 0.963 on 12 PAH categories
Size diagnostics depend on PAH charge state
12.5 micron feature is a versatile tracer across charge states
Abstract
In the era of high-sensitivity infrared (IR) astronomy, traditional manual diagnostics are no longer sufficient to harvest the complex physical insights hidden within interstellar spectra. We introduce a machine learning paradigm that bypasses the limitations of empirical band ratios by treating the complete IR spectrum of polycyclic aromatic hydrocarbons (PAHs) as a high-dimensional fingerprint. Using a random forest classifier trained on over 23000 spectra, we achieve a robust F1-score of 0.963 across 12 size and charge categories, maintaining high performance on unseen molecular mixtures. Interrogating the model's decision-making process reveals that PAH size diagnostics are charge-dependent. Neutral PAHs are traced by C-H modes, while ionized species rely on 6-8 micron C-C morphology; however, the 12.5micron feature remains a versatile tracer across multiple charge states. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
