April 29, 2019

Machines Can Create Art, but Can They Jam?

Jazz composition and performance is the next frontier in creative AI

This article was published in Scientific American’s former blog network and reflects the views of the author, not necessarily those of Scientific American

As the CTO of an applied computer vision company, I spend the bulk of my time overseeing and honing full-stack AI setups that employ neural networks and computer vision algorithms to identify and analyze the content of images and pictures. In my spare hours, I play the sax with my jazz quintet, trading off solos, jams and improvisations with the other musicians. It’s a process full of emotion, surprises and communication. I have always seen it as a distinctly human enterprise—one impervious to technology’s usurping ambition.

Indeed, for a long time I saw AI’s viability only through the lens of its utilitarian advantages. The recent deluge of experiments involving neural networks and creativity—everything from writing poetry and designing mid-century furniture to generating deliberately non-derivative paintings and creating runway fashions––has started to change my view. It has made me wonder whether the same approach could be applied to jazz––and to what end? Was there any benefit to creating an AI robot, program or agent that is capable of passing a jazz Turing test, and is that even conceivable with today’s AI state of the art? I researched the topic and spoke with experts at the nexus of AI and music from academia and business, as well as with some of my fellow musicians. Here’s what I found.

Some Music Genres Lend Themselves Better to AI than Do Others

On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.

Much has already been done at the intersection of AI and music generation. Tech giants such as Microsoft, Google, IBM Watson and Sony, along with startups such as Aiva and Amper, have commercially available technology and businesses around AI-generated music. Last summer, YouTube star Taryn Southern released I am AI, an album that was created with the help of tools and technology from Aiva, Amper, Microsoft and IBM. Chances are that some of the soundtracks you hear in stores, elevators, infomercials and video games is AI-composed. Some is performed live by orchestras from AI-created scores and arrangements. Some is in-studio production style of pop music spewed in ultrapolished form directly from a computer itself.

Despite a few human-intervened exceptions, what you probably won’t hear in those venues and platforms is AI-generated jazz. That’s somewhat surprising since the sometimes unexpected outcome of algorithms would seem to lend itself to the genre’s improvisatory nature. Then again, as any professional jazz musician or club owner can attest, jazz tends to be a labor of love for performers. It serves an enthusiast niche audience. And it can hardly be expected to attract the kind of business urgency that auto-soundtracking YouTube content and video games does.

But there are other challenges as well.

How Deep Is Your Learning?

DeepJazz is a 2016 project by Princeton computer science student Ji-Sung Kim that spews out piano solo variations on Pat Metheny’s “And Then I Knew.” The model was created using the original Pat Metheny track MIDI file as the data source, the Keras and Theano machine learning APIs, and a

long-term short memory (LTSM) recurrent neural network. Recurrent neural networks (RNNs) are popular in today’s AI composition because they learn from previous input by looping and thus backpropagate on the fly.

That said, traditional RNNs tend to only work with short musical phrases. If you’re composing anything longer than a ringtone, LTSMs come into play because they’re able to engage more memory and work over the course of an entire song, tackling the overall structure, verses, bridges, refrains and so on.

With a little more finessing of tempo and dynamics, the expansive melodies and hopping chord progressions that DeepJazz produces could certainly pass for the real thing if you heard them on the radio, in the supermarket, or on hold with customer service. But if presented as a computer or human Turing test question—i.e. was this song created by a human or a machine?––maybe not so much. Since it was only trained on a single song, DeepJazz’s output can only ever produce results that sound similar to that one song.

What’s more, the output reduces the original guitar, bass, drums and keyboard instrumentation to just piano. Generating improvs from the original song, with its original instrumentation, would be a much more complex undertaking. After all, there’s a big difference between the fixed notes of a piano and the more malleable extended tones of the other instruments typically associated with jazz, such as trumpet, trombone and saxophone.

“The thing that makes wind instruments so hard for computers is that you’re pumping energy into them all the time, so you have continuous control and jazz players are free to use that control very expressively,” says Carnegie Mellon computer science professor Roger Dannenberg, who also plays jazz trumpet. “It’s not just figuring out what notes to play, but how to play them. You have almost infinite flexibility over vibrato, bending the pitch, and even producing sounds that instruments such as piano simply aren’t capable of.”

Live Aids

Beyond timbre, live performance with AI “musicians” brings other challenges. Regardless of musical genre, the ad hoc, real-time communication that takes place between musicians during the collective improvisation of live jams simply isn’t there yet between machines and humans. The acoustics of a room or performance venue that affect sound, the energy of the audience and, of course, the visual cues shared between musicians cannot be accounted for by any current technology.

It would require sophisticated audio recognition that allows machines to hear and interpret the other instruments, advanced computer vision to pick up on varied and subtle visual cues, and some way to signal and communicate with the human musicians––all synced up with a real-time improvisatory algorithm. The computing power alone required to support those operations would be staggering. Much research––from a robot marimba to a Nintendo Wii-activated bebop improvisation generator to

existing music software mash-ups capable of robotic call-and-response solos with human musicians––has been completed in this area, but nothing yet pulls together the universal or general AI equivalent of a worthwhile human jazz musician.

It Don’t Mean a Thing if it Ain’t Got That Swing

Given some of the research and experiments around art created with neural networks trained on existing masterpieces, I often wonder if the same can be accomplished with the music of legendary jazz musicians. Is it possible to recreate the superfast virtuosic bebop jazz solos of Charlie Parker or the minimalist precision of Count Basie’s piano and band?

Charlie Parker did play notes, so some of his tunes have been transcribed into sheet music and MIDI already, and some even fed to deep learning algorithms. But that’s still not enough training data to output new Charlie Parker solos performed by machines in ways that would be compelling and Turing test–proof. Though tools exist to parse out individual instrument tracks from songs, they are not yet good enough to untangle recordings of live shows, which constitute a large portion of a jazz great’s oeuvre.

“That’s another big signal processing and machine learning problem that’s a very active area of research, but it’s not a solved problem,” says Dannenberg. And that’s not even taking into account subtleties of tempo, timbre, dynamics, tension, release, drama and storytelling that are unique to each live performance and recording.

You’d probably need to create your own data set from scratch: Get new jazz musicians to play each instrument in every possible Charlie Parker– or Count Basie–like way and then train algorithms on those recordings. That approach is similar to what Amper Music has done for other musical genres. It’s too late to get custom samples from Parker or Basie themselves, but not so for Joshua Redman or Kamasi Washington. Think of it as motion capture for audio.

Do We Need AI Jazz?

For now, some of the most promising research using AI and jazz is that underway at the Defense Advanced Research Projects Agency (DARPA), which is developing jazz-playing robots to study and advance communication between humans and machines, something that would be as useful on stage or at a highway exit clogged with merging autonomous cars as it is on the battlefield. In terms of AI-generated music, that still seems better suited to more musical-score–based genres such as movie and classical music, or highly produced and often synthesized and sample-heavy pop music.

I’m also curious about AI and jazz simply because it would be useful to someday have an on-demand musical partner to jam with in a seamlessly realistic way any time of the day or night, not just for pure enjoyment, but also for learning. After all, how much better could music students hone their talents if they had AI-based teachers that could provide instruction and feedback on their playing anytime? It would be a boon to music education and make practice even more practical.

We’re still a way off from either offering. We still don’t have the slightest idea how to get these jazz AI musicians to be able to either play with or detect the presence of “swing,” “emotion” and “soul.” More importantly, can they improvise––zigging when traditional training and music theory would have them zag? As my band’s drummer puts it: “If you’re talking about live improvisation, that gets to the ultimate core of what AI is. That’s like having a relationship. It has to be 100 percent real.”

Despite the world’s current love affair with machine learning, it may not be the final approach for AI-generated music. “Deep learning in jazz has similarly downplayed the crucial rhythmic, timbral, and textural aspects of music,” says University of California, San Diego, music professor and saxophonist David Borgo who, besides being my friend, also wrote a fascinating chapter in The Routledge Companion to Jazz Studies on improvisation and computers. “Research in this area has tended to focus on getting computers to play the ‘right notes,’ but we are still a long way from designing systems capable of the micro and macro temporal, timbral and textural adjustments necessary to groove together and to develop high-level collective improvisation in an unscripted fashion with human musicians (rather than insisting that human musicians improvise with, or groove to, the computer).”

In other words, even if I am lucky to one day get a robo–Charlie Parker on-demand bandmate, it’s still likely to be a one-sided experience until we get to the holy grail of human-level general AI.