SpecX: A Large-Scale Benchmark for Multi-Modal Spectroscopy and Cross-Paradigm Evaluation

Chengrui Xiang; Tengfei Ma; Yujie Chen; Tong Wang; Haowen Chen; Xiangxiang Zeng

arXiv:2605.18791·eess.IV·May 20, 2026

SpecX: A Large-Scale Benchmark for Multi-Modal Spectroscopy and Cross-Paradigm Evaluation

Chengrui Xiang, Tengfei Ma, Yujie Chen, Tong Wang, Haowen Chen, Xiangxiang Zeng

PDF

TL;DR

SpecX is a comprehensive large-scale benchmark dataset for multi-modal spectroscopy, enabling evaluation of models across diverse spectral modalities and tasks.

Contribution

Introduces SpecX, the first large-scale, multi-modal spectroscopy benchmark with cross-paradigm evaluation and diverse spectral modalities.

Findings

01

Specialized models excel at signal-level tasks.

02

MLLMs are strong in high-level reasoning.

03

SpecX highlights the gap in spectrum-native foundation models.

Abstract

Existing spectral benchmarks are limited in scale, modality alignment, and evaluation scope, and typically focus on either specialized models or multimodal language models (MLLMs). We introduce SpecX, a large-scale benchmark for multi-modal spectroscopy with cross-paradigm evaluation. SpecX contains 1.7M molecules with diverse spectral modalities, including NMR (1H, 13C, HSQC), IR, MS,UV,Raman and FL, and is organized into three tiers: a large-scale dataset for pretraining, an aligned multi-spectral subset for benchmarking, and a high-quality experimental subset for evaluation. SpecX supports a range of tasks such as molecular elucidation, spectrum simulation, and spectral understanding, and enables unified evaluation across both specialized spectral models and MLLMs. Experiments show that specialized models excel at signal-level modeling, while MLLMs exhibit strengths in high-level…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.