STRUM: A Spectral Transcription and Rhythm Understanding Model for End-to-End Generation of Playable Rhythm-Game Charts
Joshua Opria

TL;DR
STRUM is an end-to-end audio-to-game chart generation model that converts raw recordings into playable rhythm game charts across multiple instruments, using a multi-stage hybrid approach.
Contribution
It introduces a novel multi-stage hybrid pipeline for automatic rhythm game chart transcription from raw audio without oracle metadata.
Findings
Achieves high F1 scores for drums (0.838), bass (0.694), guitar (0.651), and vocals (0.539).
Evaluates on a new benchmark of 30 songs with detailed ablation studies.
Provides code, models, and benchmark data for reproducibility.
Abstract
We present STRUM (Spectral Transcription and Rhythm Understanding Model), an audio-to-chart pipeline that converts raw recordings into playable Clone Hero / YARG charts for drums, guitar, bass, vocals, and keys without any oracle metadata. STRUM is a multi-stage hybrid: a two-stage CRNN onset detector and a six-model ensemble classifier for drums; neural onset detectors with monophonic pitch tracking for guitar and bass; word-aligned ASR for vocals; and spectral keyboard detection for keys. We evaluate on a 30-song in-envelope benchmark constructed by screening candidate songs on a single audio-quality criterion -- the median 1-second drum-stem RMS after htdemucs_6s source separation. On this benchmark STRUM achieves drums onset F1 = 0.838, bass F1 = 0.694, guitar F1 = 0.651, and vocals F1 = 0.539 at a +/- 100 ms tolerance with per-song global offset search. We report a complete…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
