QuantumLLMInstruct: A 500k LLM Instruction-Tuning Dataset with Problem-Solution Pairs for Quantum Computing
Shlomo Kashani

TL;DR
QuantumLLMInstruct introduces a vast, diverse dataset of over 500,000 instruction-following problem-solution pairs tailored for quantum computing, aiming to enhance LLM capabilities in this complex domain.
Contribution
The paper presents the creation of the largest and most comprehensive quantum computing dataset for instruction fine-tuning, utilizing a rigorous multi-stage methodology including domain-specific problem generation and self-assessment.
Findings
Dataset contains over 500,000 high-quality problem-solution pairs.
Incorporates advanced reasoning techniques like Chain-of-Thought and ToRA.
Validated for accuracy and diversity through self-assessment.
Abstract
We present QuantumLLMInstruct (QLMMI), an innovative dataset featuring over 500,000 meticulously curated instruction-following problem-solution pairs designed specifically for quantum computing - the largest and most comprehensive dataset of its kind. Originating from over 90 primary seed domains and encompassing hundreds of subdomains autonomously generated by LLMs, QLMMI marks a transformative step in the diversity and richness of quantum computing datasets. Designed for instruction fine-tuning, QLMMI seeks to significantly improve LLM performance in addressing complex quantum computing challenges across a wide range of quantum physics topics. While Large Language Models (LLMs) have propelled advancements in computational science with datasets like Omni-MATH and OpenMathInstruct, these primarily target Olympiad-level mathematics, leaving quantum computing largely unexplored. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Physics and Python Applications
