A Contextual Latent Space Model: Subsequence Modulation in Melodic Sequence
Taketo Akama

TL;DR
This paper introduces a new generative model called CLSM that enables more controlled and semantically meaningful exploration of subsequences in music sequences, improving interpolation smoothness and sample quality.
Contribution
The paper proposes a novel contextual latent space model that incorporates context-aware priors and encoders for enhanced subsequence generation and exploration in music.
Findings
Smoother interpolation in the latent space compared to baselines
Higher quality of generated music samples
Effective exploration of semantically similar subsequences
Abstract
Some generative models for sequences such as music and text allow us to edit only subsequences, given surrounding context sequences, which plays an important part in steering generation interactively. However, editing subsequences mainly involves randomly resampling subsequences from a possible generation space. We propose a contextual latent space model (CLSM) in order for users to be able to explore subsequence generation with a sense of direction in the generation space, e.g., interpolation, as well as exploring variations -- semantically similar possible subsequences. A context-informed prior and decoder constitute the generative model of CLSM, and a context position-informed encoder is the inference model. In experiments, we use a monophonic symbolic music dataset, demonstrating that our contextual latent space is smoother in interpolation than baselines, and the quality of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech Recognition and Synthesis
