Last month, Stanford researchers declared that a new era of artificial intelligence had arrived, one built atop colossal neural networks and oceans of data. They said a new research center at Stanford would build—and study—these “foundational models” of AI.
Critics of the idea surfaced quickly—including at the workshop organized to mark the launch of the new center. Some object to the limited capabilities and sometimes freakish behavior of these models; others warn of focusing too heavily on one way of making machines smarter.
“I think the term ‘foundation’ is horribly wrong,” Jitendra Malik, a professor at UC Berkeley who studies AI, told workshop attendees in a video discussion.
Malik acknowledged that one type of model identified by the Stanford researchers—large language models that can answer questions or generate text from a prompt—has great practical use. But he said evolutionary biology suggests that language builds on other aspects of intelligence like interaction with the physical world.
“These models are really castles in the air; they have no foundation whatsoever,” Malik said. “The language we have in these models is not grounded, there is this fakeness, there is no real understanding.” He declined an interview request.
A research paper coauthored by dozens of Stanford researchers describes “an emerging paradigm for building artificial intelligence systems” that it labeled “foundational models.” Ever-larger AI models have produced some impressive advances in AI in recent years, in areas such as perception and robotics as well as language.
Large language models are also foundational to big tech companies like Google and Facebook, which use them in areas like search, advertising, and content moderation. Building and training large language models can require millions of dollars worth of cloud computing power; so far, that’s limited their development and use to a handful of well-heeled tech companies.
But big models are problematic, too. Language models inherit bias and offensive text from the data they are trained on, and they have zero grasp of common sense or what is true or false. Given a prompt, a large language model may spit out unpleasant language or misinformation. There is also no guarantee that these large models will continue to produce advances in machine intelligence.
The Stanford proposal has divided the research community. “Calling them ‘foundation models’ completely messes up the discourse,” says Subbarao Kambhampati, a professor at Arizona State University. There is no clear path from these models to more general forms of AI, Kambhampati says.
Thomas Dietterich, a professor at Oregon State University and former president of the Association for the Advancement of Artificial Intelligence, says he has “huge respect” for the researchers behind the new Stanford center, and he believes they are genuinely concerned about the problems these models raise.
But Dietterich wonders if the idea of foundational models isn’t partly about getting funding for the resources needed to build and work on them. “I was surprised that they gave these models a fancy name and created a center,” he says. “That does smack of flag planting, which could have several benefits on the fundraising side.”
Stanford has also proposed the creation of a National AI Cloud to make industry-scale computing resources available to academics working on AI research projects.
Emily M. Bender, a professor in the linguistics department at the University of Washington, says she worries that the idea of foundational models reflects a bias toward investing in the data-centric approach to AI favored by industry.
Bender says it is especially important to study the risks posed by big AI models. She coauthored a paper, published in March, that drew attention to problems with large language models and contributed to the departure of two Google researchers. But she says scrutiny should come from multiple disciplines.
“There are all of these other adjacent, really important fields that are just starved for funding,” she says. “Before we throw money into the cloud, I would like to see money going into other disciplines.”
The design can run a big neural network more efficiently than banks of GPUs wired together. But manufacturing and running the chip is a challenge, requiring new methods for etching silicon features, a design that includes redundancies to account for manufacturing flaws, and a novel water system to keep the giant chip chilled.
To build a cluster of WSE-2 chips capable of running AI models of record size, Cerebras had to solve another engineering challenge: how to get data in and out of the chip efficiently. Regular chips have their own memory on board, but Cerebras developed an off-chip memory box called MemoryX. The company also created software that allows a neural network to be partially stored in that off-chip memory, with only the computations shuttled over to the silicon chip. And it built a hardware and software system called SwarmX that wires everything together.
“They can improve the scalability of training to huge dimensions, beyond what anybody is doing today,” says Mike Demler, a senior analyst with the Linley Group and a senior editor of The Microprocessor Report.
Demler says it isn’t yet clear how much of a market there will be for the cluster, especially since some potential customers are already designing their own, more specialized chips in-house. He adds that the real performance of the chip, in terms of speed, efficiency, and cost, are as yet unclear. Cerebras hasn’t published any benchmark results so far.
“There’s a lot of impressive engineering in the new MemoryX and SwarmX technology,” Demler says. “But just like the processor, this is highly specialized stuff; it only makes sense for training the very largest models.”
Cerebras’ chips have so far been adopted by labs that need supercomputing power. Early customers include Argonne National Labs, Lawrence Livermore National Lab, pharma companies including GlaxoSmithKline and AstraZeneca, and what Feldman describes as “military intelligence” organizations.
This shows that the Cerebras chip can be used for more than just powering neural networks; the computations these labs run involve similarly massive parallel mathematical operations. “And they’re always thirsty for more compute power,” says Demler, who adds that the chip could conceivably become important for the future of supercomputing.
David Kanter, an analyst with Real World Technologies and executive director of MLCommons, an organization that measures the performance of different AI algorithms and hardware, says he sees a future market for much bigger AI models. “I generally tend to believe in data-centric ML [machine learning], so we want larger data sets that enable building larger models with more parameters,” Kanter says.
A complication of infection known as sepsis is the number one killer in US hospitals. So it’s not surprising that more than 100 health systems use an early warning system offered by Epic Systems, the dominant provider of US electronic health records. The system throws up alerts based on a proprietary formula tirelessly watching for signs of the condition in a patient’s test results.
But a new study using data from nearly 30,000 patients in University of Michigan hospitals suggests Epic’s system performs poorly. The authors say it missed two-thirds of sepsis cases, rarely found cases medical staff did not notice, and frequently issued false alarms.
Karandeep Singh, an assistant professor at University of Michigan who led the study, says the findings illustrate a broader problem with the proprietary algorithms increasingly used in health care. “They’re very widely used, and yet there’s very little published on these models,” Singh says. “To me that’s shocking.”
The study was published Monday in JAMA Internal Medicine. An Epic spokesperson disputed the study’s conclusions, saying the company’s system has “helped clinicians save thousands of lives.”
Epic’s is not the first widely used health algorithm to trigger concerns that technology supposed to improve health care is not delivering, or even actively harmful. In 2019, a system used on millions of patients to prioritize access to special care for people with complex needs was found to lowball the needs of Black patients compared to white patients. That prompted some Democratic senators to ask federal regulators to investigate bias in health algorithms. A study published in April found that statistical models used to predict suicide risk in mental health patients performed well for white and Asian patients but poorly for Black patients.
The way sepsis stalks hospital wards has made it a special target of algorithmic aids for medical staff. Guidelines from the Centers for Disease Control and Prevention to health providers on sepsis encourage use of electronic medical records for surveillance and predictions. Epic has several competitors offering commercial warning systems, and some US research hospitals have built their own tools.
Automated sepsis warnings have huge potential, Singh says, because key symptoms of the condition, such as low blood pressure, can have other causes, making it difficult for staff to spot early. Starting sepsis treatment such as antibiotics just an hour sooner can make a big difference to patient survival. Hospital administrators often take special interest in sepsis response, in part because it contributes to US government hospital ratings.
Singh runs a lab at Michigan researching applications of machine learning to patient care. He got curious about Epic’s sepsis warning system after being asked to chair a committee at the university’s health system created to oversee uses of machine learning.
As Singh learned more about the tools in use at Michigan and other health systems, he became concerned that they mostly came from vendors that disclosed little about how they worked or performed. His own system had a license to use Epic’s sepsis prediction model, which the company told customers was highly accurate. But there had been no independent validation of its performance.
Singh and Michigan colleagues tested Epic’s prediction model on records for nearly 30,000 patients covering almost 40,000 hospitalizations in 2018 and 2019. The researchers noted how often Epic’s algorithm flagged people who developed sepsis as defined by the CDC and the Centers for Medicare and Medicaid Services. And they compared the alerts that the system would have triggered with sepsis treatments logged by staff, who did not see Epic sepsis alerts for patients included in the study.
The researchers say their results suggest Epic’s system wouldn’t make a hospital much better at catching sepsis and could burden staff with unnecessary alerts. The company’s algorithm did not identify two-thirds of the roughly 2,500 sepsis cases in the Michigan data. It would have alerted for 183 patients who developed sepsis but had not been given timely treatment by staff.
A new video from human rights organization Amnesty International maps the locations of more than 15,000 cameras used by the New York Police Department, both for routine surveillance and in facial-recognition searches. A 3D model shows the 200-meter range of a camera, part of a sweeping dragnet capturing the unwitting movements of nearly half of the city’s residents, putting them at risk for misidentification. The group says it is the first to map the locations of that many cameras in the city.
Amnesty International and a team of volunteer researchers mapped cameras that can feed NYPD’s much criticized facial-recognition systems in three of the city’s five boroughs—Manhattan, Brooklyn, and the Bronx—finding 15,280 in total. Brooklyn is the most surveilled, with over 8,000 cameras.
“You are never anonymous,” says Matt Mahmoudi, the AI researcher leading the project. The NYPD has used the cameras in almost 22,000 facial-recognition searches since 2017, according to NYPD documents obtained by the Surveillance Technology Oversight Project, a New York privacy group.
“Whether you’re attending a protest, walking to a particular neighborhood, or even just grocery shopping, your face can be tracked by facial-recognition technology using imagery from thousands of camera points across New York,” Mahmoudi says.
The cameras are often placed on top of buildings, on street lights, and at intersections. The city itself owns thousands of cameras; in addition, private businesses and homeowners often grant access to police.
Police can compare faces captured by these cameras to criminal databases to search for potential suspects. Earlier this year, the NYPD was required to disclose the details of its facial-recognition systems for public comment. But those disclosures didn’t include the number or location of cameras, or any details of how long data is retained or with whom data is shared.
The Amnesty International team found that the cameras are often clustered in majority nonwhite neighborhoods. NYC’s most surveilled neighborhood is East New York, Brooklyn, where the group found 577 cameras in less than 2 square miles. More than 90 percent of East New York’s residents are nonwhite, according to city data.
Facial-recognition systems often perform less accurately on darker-skinned people than lighter-skinned people. In 2016, Georgetown University researchers found that police departments across the country used facial recognition to identify nonwhite potential suspects more than their white counterparts.
In a statement, an NYPD spokesperson said the department never arrests anyone “solely on the basis of a facial-recognition match,” and only uses the tool to investigate “a suspect or suspects related to the investigation of a particular crime.”
“Where images are captured at or near a specific crime, comparison of the image of a suspect can be made against a database that includes only mug shots legally held in law enforcement records based on prior arrests,” the statement reads.
Amnesty International is releasing the map and accompanying videos as part of its #BantheScan campaign urging city officials to ban police use of the tool ahead of the city’s mayoral primary later this month. In May, Vice asked mayoral candidates if they’d support a ban on facial recognition. While most didn’t respond to the inquiry, candidate Dianne Morales told the publication she supported a ban, while candidates Shaun Donovan and Andrew Yang suggested auditing for disparate impact before deciding on any regulation.
In recent years, researchers have used artificial intelligence to improve translation between programming languages or automatically fix problems. The AI system DrRepair, for example, has been shown to solve most issues that spawn error messages. But some researchers dream of the day when AI can write programs based on simple descriptions from non-experts.
On Tuesday, Microsoft and OpenAI shared plans to bring GPT-3, one of the world’s most advanced models for generating text, to programming based on natural language descriptions. This is the first commercial application of GPT-3 undertaken since Microsoft invested $1 billion in OpenAI last year and gained exclusive licensing rights to GPT-3.
“If you can describe what you want to do in natural language, GPT-3 will generate a list of the most relevant formulas for you to choose from,” said Microsoft CEO Satya Nadella in a keynote address at the company’s Build developer conference. “The code writes itself.”
Microsoft VP Charles Lamanna told WIRED the sophistication offered by GPT-3 can help people tackle complex challenges and empower people with little coding experience. GPT-3 will translate natural language into PowerFx, a fairly simple programming language similar to Excel commands that Microsoft introduced in March.
This is the latest demonstration of applying AI to coding. Last year at Microsoft’s Build, OpenAI CEO Sam Altman demoed a language model fine-tuned with code from GitHub that automatically generates lines of Python code. As WIRED detailed last month, startups like SourceAI are also using GPT-3 to generate code. IBM last month showed how its Project CodeNet, with 14 million code samples from more than 50 programming languages, could reduce the time needed to update a program with millions of lines of Java code for an automotive company from one year to one month.
Microsoft’s new feature is based on a neural network architecture known as Transformer, used by big tech companies including Baidu, Google, Microsoft, Nvidia, and Salesforce to create large language models using text training data scraped from the web. These language models continually grow larger. The largest version of Google’s BERT, a language model released in 2018, had 340 million parameters, a building block of neural networks. GPT-3, which was released one year ago, has 175 billion parameters.
Such efforts have a long way to go, however. In one recent test, the best model succeeded only 14 percent of the time on introductory programming challenges compiled by a group of AI researchers.