Embedding Alignment in Code Generation for Audio
Sam Kouteili, Hiren Madhu, George Typaldos, Mark Santolucito

TL;DR
This paper explores the relationship between code and audio embeddings in music generation, proposing a model to align these embeddings to improve diverse code candidate generation for audio outputs.
Contribution
It introduces a novel approach to learn an embedding alignment map between code and audio, addressing the challenge of diverse code candidate generation in music-related code synthesis.
Findings
Code and audio embeddings are not linearly related.
A predictive model for code-to-audio embedding mapping can be learned.
Embedding alignment improves diversity in generated audio outputs.
Abstract
LLM-powered code generation has the potential to revolutionize creative coding endeavors, such as live-coding, by enabling users to focus on structural motifs over syntactic details. In such domains, when prompting an LLM, users may benefit from considering multiple varied code candidates to better realize their musical intentions. Code generation models, however, struggle to present unique and diverse code candidates, with no direct insight into the code's audio output. To better establish a relationship between code candidates and produced audio, we investigate the topology of the mapping between code and audio embedding spaces. We find that code and audio embeddings do not exhibit a simple linear relationship, but supplement this with a constructed predictive model that shows an embedding alignment map could be learned. Supplementing the aim for musically diverse output, we present a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
