Musical Speech: A Transformer-based Composition Tool
Jason d'Eon, Sri Harsha Dumpala, Chandramouli Shama Sastry, Dani Oore, and Sageev Oore

TL;DR
This paper introduces a Transformer-based tool that converts user-provided speech into musical outlines, enabling personalized music creation with a clear link between speech and music, without needing paired datasets.
Contribution
The paper presents a novel pipeline combining speech processing, heuristics, and Transformer models for music generation from speech without requiring paired training data.
Findings
Effective music generation from speech demonstrated
Tool allows user customization with speech input
No paired dataset needed for training
Abstract
In this paper, we propose a new compositional tool that will generate a musical outline of speech recorded/provided by the user for use as a musical building block in their compositions. The tool allows any user to use their own speech to generate musical material, while still being able to hear the direct connection between their recorded speech and the resulting music. The tool is built on our proposed pipeline. This pipeline begins with speech-based signal processing, after which some simple musical heuristics are applied, and finally these pre-processed signals are passed through Transformer models trained on new musical tasks. We illustrate the effectiveness of our pipeline -- which does not require a paired dataset for training -- through examples of music created by musicians making use of our tool.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dense Connections · Label Smoothing · Residual Connection · Softmax · Adam
