P-MOSS: Scheduling Main-Memory Indexes Over NUMA Servers Using Next Token Prediction

Yeasir Rayhan; Walid G. Aref

arXiv:2411.02933·cs.DB·January 22, 2026

P-MOSS: Scheduling Main-Memory Indexes Over NUMA Servers Using Next Token Prediction

Yeasir Rayhan, Walid G. Aref

PDF

Open Access

TL;DR

P-MOSS is a novel learned scheduling framework that optimizes query execution placement on NUMA servers by leveraging hardware statistics and techniques inspired by large language models, significantly improving database performance.

Contribution

It introduces a learned, hardware-aware scheduling method for DBMS queries on NUMA architectures using next token prediction techniques from language models.

Findings

01

Up to 6x increase in query throughput with P-MOSS.

02

Effective adaptation to cross-hardware and workload variations.

03

Improved data locality and core utilization.

Abstract

Ever since the Dennard scaling broke down in the early 2000s and the frequency of the CPUs stalled, vendors have started to increase the core count in each CPU chip at the expense of introducing heterogeneity, thus ushering the era of NUMA and Chiplet processors. Since then, the heterogeneity in the design space of hardware has only increased to the point that DBMS performance may vary significantly up to an order of magnitude in modern servers. An important factor that affects performance includes the location of the logical cores where the DBMS queries execute, and the location where the data resides. This paper introduces P-MOSS, a learned spatial scheduling framework that schedules query execution to specific logical cores, and co-locates data on the corresponding NUMA node. For cross-hardware and workload adaptability, P-MOSS leverages core principles from Large Language Models,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed and Parallel Computing Systems · Parallel Computing and Optimization Techniques

MethodsLinear Layer · Layer Normalization · Position-Wise Feed-Forward Layer · Adam · Attention Is All You Need · Multi-Head Attention · Residual Connection · Byte Pair Encoding · Dropout · Absolute Position Encodings