COREY: Entropy-Guided Runtime Chunk Scheduling for Selective Scan Kernels

Bo Ma; Jinsong Wu; Weiqi Yan

arXiv:2604.10597·cs.CV·May 5, 2026

COREY: Entropy-Guided Runtime Chunk Scheduling for Selective Scan Kernels

Bo Ma, Jinsong Wu, Weiqi Yan

PDF

1 Repo

TL;DR

COREY introduces an entropy-guided runtime scheduler for selective scan kernels in state space models, improving latency on GPUs but not consistently enhancing throughput over static tuning.

Contribution

It presents a novel entropy-based scheduling method that matches static oracle performance at the kernel level and explores its practical implications in GPU workloads.

Findings

01

Achieves 4.41× lower latency than unoptimized baseline on consumer GPU.

02

Matches locally optimal chunk size using entropy rule, comparable to static oracle.

03

Entropy-guided scheduling incurs overhead but can be mitigated with fallback strategies.

Abstract

Mamba selective state space models (SSMs) provide linear-time sequence modeling but remain sensitive to selective-scan chunk scheduling. We present COREY, a \emph{concept-and-feasibility} runtime scheduler that maps fixed-bin activation entropy to chunk size. We evaluate COREY in three tiers: a prototype cost model, real-checkpoint kernel timing, and routed end-to-end ablations on modern GPUs. At the kernel level, a calibrated rule, \(H_{\mathrm{ref}}=\log K\), recovers the locally optimal chunk and matches a one-time static oracle, yielding \(4.41\times\) lower latency than an unoptimized baseline on a consumer GPU and \(3.90\times\)--\(4.04\times\) lower latency on a data-center accelerator. Routing this choice into a patched live scan kernel closes the engineering loop without improving end-to-end speed: in unified routed ablations, the best static chunk outperforms all…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mabo1215/COREY_Transformer
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.