The Bach Doodle: Approachable music composition with machine learning at scale
Cheng-Zhi Anna Huang, Curtis Hawthorne, Adam Roberts, Monica, Dinculescu, James Wexler, Leon Hong, Jacob Howcroft

TL;DR
The paper introduces the Bach Doodle, an accessible AI-powered music composition tool that allows users to create melodies and receive harmonizations in Bach's style, leveraging optimized machine learning models in the browser.
Contribution
It presents a scalable, real-time browser-based implementation of Coconet for music harmonization, with reduced latency and size, enabling millions of user interactions and data collection.
Findings
Over 55 million queries received in three days
Users spent 350 years worth of time on the doodle
Model runtime reduced from 40s to 2s in the browser
Abstract
To make music composition more approachable, we designed the first AI-powered Google Doodle, the Bach Doodle, where users can create their own melody and have it harmonized by a machine learning model Coconet (Huang et al., 2017) in the style of Bach. For users to input melodies, we designed a simplified sheet-music based interface. To support an interactive experience at scale, we re-implemented Coconet in TensorFlow.js (Smilkov et al., 2019) to run in the browser and reduced its runtime from 40s to 2s by adopting dilated depth-wise separable convolutions and fusing operations. We also reduced the model download size to approximately 400KB through post-training weight quantization. We calibrated a speed test based on partial model evaluation time to determine if the harmonization request should be performed locally or sent to remote TPU servers. In three days, people spent 350 years…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Neuroscience and Music Perception
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
