A Fully GPU-Accelerated Framework for High-Performance Configuration Interaction Selection with Neural Network Quantum States

Daran Sun; Bowen Kan; Haoquan Long; Hairui Zhao; Haoxu Li; Yicheng Liu; Pengyu Zhou; Ankang Feng; Wenjing Huang; Yida Gu; Zhenyu Li; Honghui Shang; Yunquan Zhang; Dingwen Tao; Ninghui Sun; Guangming Tan

arXiv:2604.15768·cs.DC·April 28, 2026

A Fully GPU-Accelerated Framework for High-Performance Configuration Interaction Selection with Neural Network Quantum States

Daran Sun, Bowen Kan, Haoquan Long, Hairui Zhao, Haoxu Li, Yicheng Liu, Pengyu Zhou, Ankang Feng, Wenjing Huang, Yida Gu, Zhenyu Li, Honghui Shang, Yunquan Zhang, Dingwen Tao, Ninghui Sun, Guangming Tan

PDF

TL;DR

This paper presents QiankunNet-cuSCI, a fully GPU-accelerated framework for neural network quantum state configuration interaction, significantly improving scalability and speed for complex many-body quantum systems.

Contribution

It introduces a GPU-centric SCI framework with distributed de-duplication, specialized CUDA kernels, and GPU memory management, overcoming CPU-GPU bottlenecks in existing methods.

Findings

01

Achieves up to 2.32X speedup on 64 GPUs over baseline.

02

Maintains over 90% parallel efficiency in strong scaling.

03

Enables larger configuration spaces for quantum simulations.

Abstract

AI-driven methods have demonstrated considerable success in tackling the central challenge of accurately solving the Schr\"odinger equation for complex many-body systems. Among neural network quantum state (NNQS) approaches, the NNQS-SCI (Selected Configuration Interaction) method stands out as a state-of-the-art technique, recognized for its high accuracy and scalability. However, its application to larger systems is severely constrained by a hybrid CPU-GPU architecture. Specifically, centralized CPU-based global de-duplication creates a severe scalability barrier due to communication bottlenecks, while host-resident coupled-configuration generation induces prohibitive computational overheads. We introduce QiankunNet-cuSCI, a fully GPU-accelerated SCI framework designed to overcome these bottlenecks. It first integrates a distributed, load-balanced global de-duplication algorithm to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.