Select Page
LaMDA and the Sentient AI Trap

LaMDA and the Sentient AI Trap

Now head of the nonprofit Distributed AI Research, Gebru hopes that going forward people focus on human welfare, not robot rights. Other AI ethicists have said that they’ll no longer discuss conscious or superintelligent AI at all.

“Quite a large gap exists between the current narrative of AI and what it can actually do,” says Giada Pistilli, an ethicist at Hugging Face, a startup focused on language models. “This narrative provokes fear, amazement, and excitement simultaneously, but it is mainly based on lies to sell products and take advantage of the hype.”

The consequence of speculation about sentient AI, she says, is an increased willingness to make claims based on subjective impression instead of scientific rigor and proof. It distracts from “countless ethical and social justice questions” that AI systems pose. While every researcher has the freedom to research what they want, she says, “I just fear that focusing on this subject makes us forget what is happening while looking at the moon.”

What Lemoire experienced is an example of what author and futurist David Brin has called the “robot empathy crisis.” At an AI conference in San Francisco in 2017, Brin predicted that in three to five years, people would claim AI systems were sentient and insist that they had rights. Back then, he thought those appeals would come from a virtual agent that took the appearance of a woman or child to maximize human empathic response, not “some guy at Google,” he says.

The LaMDA incident is part of a transition period, Brin says, where “we’re going to be more and more confused over the boundary between reality and science fiction.”

Brin based his 2017 prediction on advances in language models. He expects that the trend will lead to scams. If people were suckers for a chatbot as simple as ELIZA decades ago, he says, how hard will it be to persuade millions that an emulated person deserves protection or money?

“There’s a lot of snake oil out there, and mixed in with all the hype are genuine advancements,” Brin says. “Parsing our way through that stew is one of the challenges that we face.”

And as empathetic as LaMDA seemed, people who are amazed by large language models should consider the case of the cheeseburger stabbing, says Yejin Choi, a computer scientist at the University of Washington. A local news broadcast in the United States involved a teenager in Toledo, Ohio, stabbing his mother in the arm in a dispute over a cheeseburger. But the headline “Cheeseburger Stabbing” is vague. Knowing what occurred requires some common sense. Attempts to get OpenAI’s GPT-3 model to generate text using “Breaking news: Cheeseburger stabbing” produces words about a man getting stabbed with a cheeseburger in an altercation over ketchup, and a man being arrested after stabbing a cheeseburger.

Language models sometimes make mistakes because deciphering human language can require multiple forms of common-sense understanding. To document what large language models are capable of doing and where they can fall short, last month more than 400 researchers from 130 institutions contributed to a collection of more than 200 tasks known as BIG-Bench, or Beyond the Imitation Game. BIG-Bench includes some traditional language-model tests like reading comprehension, but also logical reasoning and common sense.

Researchers at the Allen Institute for AI’s MOSAIC project, which documents the common-sense reasoning abilities of AI models, contributed a task called Social-IQa. They asked language models—not including LaMDA—to answer questions that require social intelligence, like “Jordan wanted to tell Tracy a secret, so Jordan leaned towards Tracy. Why did Jordan do this?” The team found large language models achieved performance 20 to 30 percent less accurate than people.

How 10 Skin Tones Will Reshape Google’s Approach to AI

How 10 Skin Tones Will Reshape Google’s Approach to AI

For years, tech companies have relied on something called the Fitzpatrick scale to classify skin tones for their computer vision algorithms. Originally designed for dermatologists in the 1970s, the system comprises only six skin tones, a possible contributor to AI’s well-documented failures in identifying people of color. Now Google is beginning to incorporate a 10-skin tone standard across its products, called the Monk Skin Tone (MST) scale, from Google Search Images to Google Photos and beyond. The development has the potential to reduce bias in data sets used to train AI in everything from health care to content moderation.

Google first signaled plans to go beyond the Fitzpatrick scale last year. Internally, the project dates back to a summer 2020 effort by four Black women at Google to make AI “work better for people of color,” according to a Twitter thread from Xango Eyeé, a responsible AI product manager at the company. At today’s Google I/O conference, the company detailed how wide an impact the new system could have across its many products. Google will also open source the MST, meaning it could replace Fitzpatrick as the industry standard for evaluating the fairness of cameras and computer vision systems.

“Think anywhere there are images of people’s faces being used where we need to test the algorithm for fairness,” says Eyeé.

The Monk Skin Tone scale is named after Ellis Monk, a Harvard University sociologist who has spent decades researching colorism’s impact on the lives of Black people in the United States. Monk created the scale in 2019 and worked with Google engineers and researchers to incorporate it into the company’s product development.

“The reality is that life chances, opportunities, all these things are very much tied to your phenotypical makeup,” Monk said in prepared remarks in a video shown at I/O. “We can weed out these biases in our technology from a really early stage and make sure the technology we have works equally well across all skin tones. I think this is a huge step forward.”

An initial analysis by Monk and Google research scientists last year found that participants felt better represented by the MST than by the Fitzpatrick scale. In an FAQ published Wednesday, Google says that having more than 10 skin tones can add complexity without extra value, unlike industries like makeup, where companies like Rihanna’s Fenty Beauty offer more than 40 shades. Google is continuing work to validate the Monk Skin Tone scale in places like Brazil, India, Mexico, and Nigeria, according to a source familiar with the matter. Further details are expected soon in an academic research article.

The company will now expand its use of the MST. Google Images will offer an option to sort makeup-related search results by skin tone based on the scale, and filters for people with more melanin are coming to Google Photos later this month. Should Google adopt the 10-skin-tone scale across its product lines, it could have implications for fairly evaluating algorithms used in Google search results, Pixel smartphones, YouTube classification algorithms, Waymo self-driving cars, and more.

Musk’s Plan to Reveal the Twitter Algorithm Won’t Solve Anything

Musk’s Plan to Reveal the Twitter Algorithm Won’t Solve Anything

“In this age of machine learning, it isn’t the algorithms, it’s the data,” says David Karger, a professor and computer scientist at MIT. Karger says Musk could improve Twitter by making the platform more open, so that others can build on top of it in new ways. “What makes Twitter important is not the algorithms,” he says. “It’s the people who are tweeting.”

A deeper picture of how Twitter works would also mean opening up more than just the handwritten algorithms. “The code is fine; the data is better; the code and data combined into a model could be best,” says Alex Engler, a fellow in governance studies at the Brookings Institution who studies AI’s impact on society. Engler adds that understanding the decisionmaking processes that Twitter’s algorithms are trained to make would also be crucial.

The machine learning models that Twitter uses are still only part of the picture, because the entire system also reacts to real-time user behavior in complex ways. If users are particularly interested in a certain news story, then related tweets will naturally get amplified. “Twitter is a socio-technical system,” says a second Twitter source. “It is responsive to human behavior.”

This fact was illustrated by research that Twitter published in December 2021 showing that right-leaning posts received more amplifications than left-leaning ones, although the dynamics behind this phenomenon were unclear.

“That’s why we audit,” says Ethan Zuckerman, a professor at the University of Massachusetts Amherst who teaches public policy, communication, and information. “Even the people who build these tools end up discovering surprising shortcomings and flaws.”

One irony of Musk’s professed motives for acquiring Twitter, Zuckerman says, is that the company has been remarkably transparent about the way its algorithm works of late. In August 2021, Twitter launched a contest that gave outside researchers access to an image-cropping algorithm that had exhibited biased behavior. The company has also been working on ways to give users greater control over the algorithms that surface content, according to those with knowledge of the work.

Releasing some Twitter code would provide greater transparency, says Damon McCoy, an associate professor at New York University who studies security and privacy of large, complex systems including social networks, but even those who built Twitter may not fully understand how it works.

A concern for Twitter’s engineering team is that, amid all this complexity, some code may be taken out of context and highlighted as a sign of bias. Revealing too much about how Twitter’s recommendation system operates might also result in security problems. Access to a recommendation system would make it easier to game the system and gain prominence. It may also be possible to exploit machine learning algorithms in ways that might be subtle and hard to detect. “Bad actors right now are probing the system and testing,” McCoy says. Access to Twitter’s models “may well help outsiders understand some of the principles used to elevate some content over others.”

On April 18, as Musk was escalating his efforts to acquire Twitter, someone with access to Twitter’s Github, where the company already releases some of its code, created a new repository called “the algorithm”—perhaps a developer’s dig at the idea that the company could easily release details of how it works. Shortly after Musk’s acquisition was announced, it disappeared.

Additional reporting by Tom Simonite.

More Great WIRED Stories

Russia’s Killer Drone in Ukraine Raises Fears About AI in Warfare

Russia’s Killer Drone in Ukraine Raises Fears About AI in Warfare

A Russian “suicide drone” that boasts the ability to identify targets using artificial intelligence has been spotted in images of the ongoing invasion of Ukraine.

Photographs showing what appears to be the KUB-BLA, a type of lethal drone known as a “loitering munition” sold by ZALA Aero, a subsidiary of the Russian arms company Kalashnikov, have appeared on Telegram and Twitter in recent days. The pictures show damaged drones that appear to have either crashed or been shot down.

With a wingspan of 1.2 meters, the sleek white drone resembles a small pilotless fighter jet. It is fired from a portable launch, can travel up to 130 kilometers per hour for 30 minutes, and deliberately crashes into a target, detonating a 3-kilo explosive.

ZALA Aero, which first demoed the KUB-BLA at a Russian air show in 2019, claims in promotional material that it features “intelligent detection and recognition of objects by class and type in real time.”

The drone itself may do little to alter the course of the war in Ukraine, as there is no evidence that Russia is using them widely so far. But its appearance has sparked concern about the potential for AI to take a greater role in making lethal decisions.

“The notion of a killer robot—where you have artificial intelligence fused with weapons—that technology is here, and it’s being used,” says Zachary Kallenborn, a research affiliate with the National Consortium for the Study of Terrorism and Responses to Terrorism (START).

Advances in AI have made it easier to incorporate autonomy into weapons systems, and have raised the prospect that more capable systems could eventually decide for themselves who to kill. A UN report published last year concluded that a lethal drone with this capability may have been used in the Libyan civil war.

It is unclear if the drone may have been operated in this way in Ukraine. One of the challenges with autonomous weapons may prove to be the difficulty of determining when full autonomy is used in a lethal context, Kallenborn says.

The KUB-BLA images have yet to be verified by official sources, but the drone is known to be a relatively new part of Russia’s military arsenal. Its use would also be consistent with Russia’s shifting strategy in the face of the unexpectedly strong Ukrainian resistance, says Samuel Bendett, an expert on Russia’s military with the defense think tank CNA.

Bendett says Russia has built up its drone capabilities in recent years, using them in Syria and acquiring more after Azerbaijani forces demonstrated their effectiveness against Armenian ground military in the 2020 ​​Nagorno-Karabakh war. “They are an extraordinarily cheap alternative to flying manned missions,” he says. “They are very effective both militarily and of course psychologically.”

The fact that Russia seems to have used few drones in Ukraine early on may be due to misjudging the resistance or because of effective Ukrainian countermeasures.

But drones have also highlighted a key vulnerability in Russia’s invasion, which is now entering its third week. Ukrainian forces have used a remotely operated Turkish-made drone called the TB2 to great effect against Russian forces, shooting guided missiles at Russian missile launchers and vehicles. The paraglider-sized drone, which relies on a small crew on the ground, is slow and cannot defend itself, but it has proven effective against a surprisingly weak Russian air campaign.

A Stanford Proposal Over AI’s ‘Foundations’ Ignites Debate

A Stanford Proposal Over AI’s ‘Foundations’ Ignites Debate

Last month, Stanford researchers declared that a new era of artificial intelligence had arrived, one built atop colossal neural networks and oceans of data. They said a new research center at Stanford would build—and study—these “foundational models” of AI.

Critics of the idea surfaced quickly—including at the workshop organized to mark the launch of the new center. Some object to the limited capabilities and sometimes freakish behavior of these models; others warn of focusing too heavily on one way of making machines smarter.

“I think the term ‘foundation’ is horribly wrong,” Jitendra Malik, a professor at UC Berkeley who studies AI, told workshop attendees in a video discussion.

Malik acknowledged that one type of model identified by the Stanford researchers—large language models that can answer questions or generate text from a prompt—has great practical use. But he said evolutionary biology suggests that language builds on other aspects of intelligence like interaction with the physical world.

“These models are really castles in the air; they have no foundation whatsoever,” Malik said. “The language we have in these models is not grounded, there is this fakeness, there is no real understanding.” He declined an interview request.

A research paper coauthored by dozens of Stanford researchers describes “an emerging paradigm for building artificial intelligence systems” that it labeled “foundational models.” Ever-larger AI models have produced some impressive advances in AI in recent years, in areas such as perception and robotics as well as language.

Large language models are also foundational to big tech companies like Google and Facebook, which use them in areas like search, advertising, and content moderation. Building and training large language models can require millions of dollars worth of cloud computing power; so far, that’s limited their development and use to a handful of well-heeled tech companies.

But big models are problematic, too. Language models inherit bias and offensive text from the data they are trained on, and they have zero grasp of common sense or what is true or false. Given a prompt, a large language model may spit out unpleasant language or misinformation. There is also no guarantee that these large models will continue to produce advances in machine intelligence.

The Stanford proposal has divided the research community. “Calling them ‘foundation models’ completely messes up the discourse,” says Subbarao Kambhampati, a professor at Arizona State University. There is no clear path from these models to more general forms of AI, Kambhampati says.

Thomas Dietterich, a professor at Oregon State University and former president of the Association for the Advancement of Artificial Intelligence, says he has “huge respect” for the researchers behind the new Stanford center, and he believes they are genuinely concerned about the problems these models raise.

But Dietterich wonders if the idea of foundational models isn’t partly about getting funding for the resources needed to build and work on them. “I was surprised that they gave these models a fancy name and created a center,” he says. “That does smack of flag planting, which could have several benefits on the fundraising side.”

Stanford has also proposed the creation of a National AI Cloud to make industry-scale computing resources available to academics working on AI research projects.

Emily M. Bender, a professor in the linguistics department at the University of Washington, says she worries that the idea of foundational models reflects a bias toward investing in the data-centric approach to AI favored by industry.

Bender says it is especially important to study the risks posed by big AI models. She coauthored a paper, published in March, that drew attention to problems with large language models and contributed to the departure of two Google researchers. But she says scrutiny should come from multiple disciplines.

“There are all of these other adjacent, really important fields that are just starved for funding,” she says. “Before we throw money into the cloud, I would like to see money going into other disciplines.”