Last month, Stanford researchers declared that a new era of artificial intelligence had arrived, one built atop colossal neural networks and oceans of data. They said a new research center at Stanford would build—and study—these “foundational models” of AI.
Critics of the idea surfaced quickly—including at the workshop organized to mark the launch of the new center. Some object to the limited capabilities and sometimes freakish behavior of these models; others warn of focusing too heavily on one way of making machines smarter.
“I think the term ‘foundation’ is horribly wrong,” Jitendra Malik, a professor at UC Berkeley who studies AI, told workshop attendees in a video discussion.
Malik acknowledged that one type of model identified by the Stanford researchers—large language models that can answer questions or generate text from a prompt—has great practical use. But he said evolutionary biology suggests that language builds on other aspects of intelligence like interaction with the physical world.
“These models are really castles in the air; they have no foundation whatsoever,” Malik said. “The language we have in these models is not grounded, there is this fakeness, there is no real understanding.” He declined an interview request.
A research paper coauthored by dozens of Stanford researchers describes “an emerging paradigm for building artificial intelligence systems” that it labeled “foundational models.” Ever-larger AI models have produced some impressive advances in AI in recent years, in areas such as perception and robotics as well as language.
Large language models are also foundational to big tech companies like Google and Facebook, which use them in areas like search, advertising, and content moderation. Building and training large language models can require millions of dollars worth of cloud computing power; so far, that’s limited their development and use to a handful of well-heeled tech companies.
But big models are problematic, too. Language models inherit bias and offensive text from the data they are trained on, and they have zero grasp of common sense or what is true or false. Given a prompt, a large language model may spit out unpleasant language or misinformation. There is also no guarantee that these large models will continue to produce advances in machine intelligence.
The Stanford proposal has divided the research community. “Calling them ‘foundation models’ completely messes up the discourse,” says Subbarao Kambhampati, a professor at Arizona State University. There is no clear path from these models to more general forms of AI, Kambhampati says.
Thomas Dietterich, a professor at Oregon State University and former president of the Association for the Advancement of Artificial Intelligence, says he has “huge respect” for the researchers behind the new Stanford center, and he believes they are genuinely concerned about the problems these models raise.
But Dietterich wonders if the idea of foundational models isn’t partly about getting funding for the resources needed to build and work on them. “I was surprised that they gave these models a fancy name and created a center,” he says. “That does smack of flag planting, which could have several benefits on the fundraising side.”
Stanford has also proposed the creation of a National AI Cloud to make industry-scale computing resources available to academics working on AI research projects.
Emily M. Bender, a professor in the linguistics department at the University of Washington, says she worries that the idea of foundational models reflects a bias toward investing in the data-centric approach to AI favored by industry.
Bender says it is especially important to study the risks posed by big AI models. She coauthored a paper, published in March, that drew attention to problems with large language models and contributed to the departure of two Google researchers. But she says scrutiny should come from multiple disciplines.
“There are all of these other adjacent, really important fields that are just starved for funding,” she says. “Before we throw money into the cloud, I would like to see money going into other disciplines.”
In 2021, technology’s role in how art is generated remains up for debate and discovery. From the rise of NFTs to the proliferation of techno-artists who use generative adversarial networks to produce visual expressions, to smartphone apps that write new music, creatives and technologists are continually experimenting with how art is produced, consumed, and monetized.
BT, the Grammy-nominated composer of 2010’s These Hopeful Machines, has emerged as a world leader at the intersection of tech and music. Beyond producing and writing for the likes of David Bowie, Death Cab for Cutie, Madonna, and the Roots, and composing scores for The Fast and the Furious, Smallville, and many other shows and movies, he’s helped pioneer production techniques like stutter editing and granular synthesis. This past spring, BT released GENESIS.JSON, a piece of software that contains 24 hours of original music and visual art. It features 15,000 individually sequenced audio and video clips that he created from scratch, which span different rhythmic figures, field recordings of cicadas and crickets, a live orchestra, drum machines, and myriad other sounds that play continuously. And it lives on the blockchain. It is, to my knowledge, the first composition of its kind.
Could ideas like GENESIS.JSON be the future of original music, where composers use AI and the blockchain to create entirely new art forms? What makes an artist in the age of algorithms? I spoke with BT to learn more.
What are your central interests at the interface of artificial intelligence and music?
I am really fascinated with this idea of what an artist is. Speaking in my common tongue—music—it’s a very small array of variables. We have 12 notes. There’s a collection of rhythms that we typically use. There’s a sort of vernacular of instruments, of tones, of timbres, but when you start to add them up, it becomes this really deep data set.
On its surface, it makes you ask, “What is special and unique about an artist?” And that’s something that I’ve been curious about my whole adult life. Seeing the research that was happening in artificial intelligence, my immediate thought was that music is low-hanging fruit.
These days, we can take the sum total of the artists’ output and we can take their artistic works and we can quantify the entire thing into a training set, a massive, multivariable training set. And we don’t even name the variables. The RNN (recurrent neural networks) and CNNs (convolutional neural networks) name them automatically.
So you’re referring to a body of music that can be used to “train” an artificial intelligence algorithm that can then create original music that resembles the music it was trained on. If we reduce the genius of artists like Coltrane or Mozart, say, into a training set and can recreate their sound, how will musicians and music connoisseurs respond?
I think that the closer we get, it becomes this uncanny valley idea. Some would say that things like music are sacrosanct and have to do with very base-level things about our humanity. It’s not hard to get into kind of a spiritual conversation about what music is as a language, and what it means, and how powerful it is, and how it transcends culture, race, and time. So the traditional musician might say, “That’s not possible. There’s so much nuance and feeling, and your life experience, and these kinds of things that go into the musical output.”
And the sort of engineer part of me goes, well Look at what Google has made. It’s a simple kind of MIDI-generation engine, where they’ve taken all Bach’s works and it’s able to spit out [Bach-like] fugues. Because Bach wrote so many fugues, he’s a great example. Also, he’s the father of modern harmony. Musicologists listen to some of those Google Magenta fugues and can’t distinguish them from Bach’s original works. Again, this makes us question what constitutes an artist.
I’m both excited and have incredible trepidation about this space that we’re expanding into. Maybe the question I want to be asking is less “We can, but should we?” and more “How do we do this responsibly, because it’s happening?”
Right now, there are companies that are using something like Spotify or YouTube to train their models with artists who are alive, whose works are copyrighted and protected. But companies are allowed to take someone’s work and train models with it right now. Should we be doing that? Or should we be speaking to the artists themselves first? I believe that there needs to be protective mechanisms put in place for visual artists, for programmers, for musicians.