Artificial intelligence is here. It’s overhyped, poorly understood, and flawed but already core to our lives—and it’s only going to extend its reach.
AI powers driverless car research, spots otherwise invisible signs of disease on medical images, finds an answer when you ask Alexa a question, and lets you unlock your phone with your face to talk to friends as an animated poop on the iPhone X using Apple’s Animoji. Those are just a few ways AI already touches our lives, and there’s plenty of work still to be done. But don’t worry, superintelligent algorithms aren’t about to take all the jobs or wipe out humanity.
The current boom in all things AI was catalyzed by breakthroughs in an area known as machine learning. It involves “training” computers to perform tasks based on examples, rather than relying on programming by a human. A technique called deep learning has made this approach much more powerful. Just ask Lee Sedol, holder of 18 international titles at the complex game of Go. He got creamed by software called AlphaGo in 2016.
There’s evidence that AI can make us happier and healthier. But there’s also reason for caution. Incidents in which algorithms picked up or amplified societal biases around race or gender show that an AI-enhanced future won’t automatically be a better one.
The Beginnings of Artificial Intelligence
Artificial intelligence as we know it began as a vacation project. Dartmouth professor John McCarthy coined the term in the summer of 1956, when he invited a small group to spend a few weeks musing on how to make machines do things like use language.
He had high hopes of a breakthrough in the drive toward human-level machines. “We think that a significant advance can be made,” he wrote with his co-organizers, “if a carefully selected group of scientists work on it together for a summer.”
Those hopes were not met, and McCarthy later conceded that he had been overly optimistic. But the workshop helped researchers dreaming of intelligent machines coalesce into a recognized academic field.
Early work often focused on solving fairly abstract problems in math and logic. But it wasn’t long before AI started to show promising results on more human tasks. In the late 1950s, Arthur Samuel created programs that learned to play checkers. In 1962, one scored a win over a master at the game. In 1967, a program called Dendral showed it could replicate the way chemists interpreted mass-spectrometry data on the makeup of chemical samples.
As the field of AI developed, so did different strategies for making smarter machines. Some researchers tried to distill human knowledge into code or come up with rules for specific tasks, like understanding language. Others were inspired by the importance of learning to understand human and animal intelligence. They built systems that could get better at a task over time, perhaps by simulating evolution or by learning from example data. The field hit milestone after milestone as computers mastered tasks that could previously only be completed by people.
Deep learning, the rocket fuel of the current AI boom, is a revival of one of the oldest ideas in AI. The technique involves passing data through webs of math loosely inspired by the working of brain cells that are known as artificial neural networks. As a network processes training data, connections between the parts of the network adjust, building up an ability to interpret future data.
Artificial neural networks became an established idea in AI not long after the Dartmouth workshop. The room-filling Perceptron Mark 1 from 1958, for example, learned to distinguish different geometric shapes and got written up in The New York Times as the “Embryo of Computer Designed to Read and Grow Wiser.” But neural networks tumbled from favor after an influential 1969 book coauthored by MIT’s Marvin Minsky suggested they couldn’t be very powerful.
Not everyone was convinced by the skeptics, however, and some researchers kept the technique alive over the decades. They were vindicated in 2012, when a series of experiments showed that neural networks fueled with large piles of data could give machines new powers of perception. Churning through so much data was difficult using traditional computer chips, but a shift to graphics cards precipitated an explosion in processing power.
Another week, another privacy horror show: Crisis Text Line, a nonprofit text message service for people experiencing serious mental health crises, has been using “anonymized” conversation data to power a for-profit machine learning tool for customer support teams. (After backlash, CTL announced it would stop.) Crisis Text Line’s response to the backlash focused on the data itself and whether it included personally identifiable information. But that response uses data as a distraction. Imagine this: Say you texted Crisis Text Line and got back a message that said “Hey, just so you know, we’ll use this conversation to help our for-profit subsidiary build a tool for companies who do customer support.” Would you keep texting?
That’s the real travesty—when the price of obtaining mental health help in a crisis is becoming grist for the profit mill. And it’s not just users of CTL who pay; it’s everyone who goes looking for help when they need it most.
Americans need help and can’t get it. The huge unmet demand for critical advice and help has given rise to a new class of organizations and software tools that exist in a regulatory gray area. They help people with bankruptcy or evictions, but they aren’t lawyers; they help people with mental health crises, but they aren’t care providers. They invite ordinary people to rely on them and often do provide real help. But these services can also avoid taking responsibility for their advice, or even abuse the trust people have put in them. They can make mistakes, push predatory advertising and disinformation, or just outright sell data. And the consumer safeguards that would normally protect people from malfeasance or mistakes by lawyers or doctors haven’t caught up.
This regulatory gray area can also constrain organizations that have novel solutions to offer. Take Upsolve, a nonprofit that develops software to guide people through bankruptcy. (The organization takes pains to claim it does not offer legal advice.) Upsolve wants to train New York community leaders to help others navigate the city’s notorious debt courts. One problem: These would-be trainees aren’t lawyers, so under New York (and nearly every other state) law, Upsolve’s initiative would be illegal. Upsolve is now suing to carve out an exception for itself. The company claims, quite rightly, that a lack of legal help means people effectively lack rights under the law.
The legal profession’s failure to grant Americans access to support is well-documented. But Upsolve’s lawsuit also raises new, important questions. Who is ultimately responsible for the advice given under a program like this, and who is responsible for a mistake—a trainee, a trainer, both? How do we teach people about their rights as a client of this service, and how to seek recourse? These are eminently answerable questions. There are lots of policy tools for creating relationships with elevated responsibilities: We could assign advice-givers a special legal status, establish a duty of loyalty for organizations that handle sensitive data, or create policy sandboxes to test and learn from new models for delivering advice.
But instead of using these tools, most regulators seem content to bury their heads in the sand. Officially, you can’t give legal advice or health advice without a professional credential. Unofficially, people can get such advice in all but name from tools and organizations operating in the margins. And while credentials can be important, regulators are failing to engage with the ways software has fundamentally changed how we give advice and care for one another, and what that means for the responsibilities of advice-givers.
And we need that engagement more than ever. People who seek help from experts or caregivers are vulnerable. They may not be able to distinguish a good service from a bad one. They don’t have time to parse terms of service dense with jargon, caveats, and disclaimers. And they have little to no negotiating power to set better terms, especially when they’re reaching out mid-crisis. That’s why the fiduciary duties that lawyers and doctors have are so necessary in the first place: not just to protect a person seeking help once, but to give people confidence that they can seek help from experts for the most critical, sensitive issues they face. In other words, a lawyer’s duty to their client isn’t just to protect that client from that particular lawyer; it’s to protect society’s trust in lawyers.
And that’s the true harm—when people won’t contact a suicide hotline because they don’t trust that the hotline has their sole interest at heart. That distrust can be contagious: Crisis Text Line’s actions might not just stop people from using Crisis Text Line. It might stop people from using any similar service. What’s worse than not being able to find help? Not being able to trust it.
A complication of infection known as sepsis is the number one killer in US hospitals. So it’s not surprising that more than 100 health systems use an early warning system offered by Epic Systems, the dominant provider of US electronic health records. The system throws up alerts based on a proprietary formula tirelessly watching for signs of the condition in a patient’s test results.
But a new study using data from nearly 30,000 patients in University of Michigan hospitals suggests Epic’s system performs poorly. The authors say it missed two-thirds of sepsis cases, rarely found cases medical staff did not notice, and frequently issued false alarms.
Karandeep Singh, an assistant professor at University of Michigan who led the study, says the findings illustrate a broader problem with the proprietary algorithms increasingly used in health care. “They’re very widely used, and yet there’s very little published on these models,” Singh says. “To me that’s shocking.”
The study was published Monday in JAMA Internal Medicine. An Epic spokesperson disputed the study’s conclusions, saying the company’s system has “helped clinicians save thousands of lives.”
Epic’s is not the first widely used health algorithm to trigger concerns that technology supposed to improve health care is not delivering, or even actively harmful. In 2019, a system used on millions of patients to prioritize access to special care for people with complex needs was found to lowball the needs of Black patients compared to white patients. That prompted some Democratic senators to ask federal regulators to investigate bias in health algorithms. A study published in April found that statistical models used to predict suicide risk in mental health patients performed well for white and Asian patients but poorly for Black patients.
The way sepsis stalks hospital wards has made it a special target of algorithmic aids for medical staff. Guidelines from the Centers for Disease Control and Prevention to health providers on sepsis encourage use of electronic medical records for surveillance and predictions. Epic has several competitors offering commercial warning systems, and some US research hospitals have built their own tools.
Automated sepsis warnings have huge potential, Singh says, because key symptoms of the condition, such as low blood pressure, can have other causes, making it difficult for staff to spot early. Starting sepsis treatment such as antibiotics just an hour sooner can make a big difference to patient survival. Hospital administrators often take special interest in sepsis response, in part because it contributes to US government hospital ratings.
Singh runs a lab at Michigan researching applications of machine learning to patient care. He got curious about Epic’s sepsis warning system after being asked to chair a committee at the university’s health system created to oversee uses of machine learning.
As Singh learned more about the tools in use at Michigan and other health systems, he became concerned that they mostly came from vendors that disclosed little about how they worked or performed. His own system had a license to use Epic’s sepsis prediction model, which the company told customers was highly accurate. But there had been no independent validation of its performance.
Singh and Michigan colleagues tested Epic’s prediction model on records for nearly 30,000 patients covering almost 40,000 hospitalizations in 2018 and 2019. The researchers noted how often Epic’s algorithm flagged people who developed sepsis as defined by the CDC and the Centers for Medicare and Medicaid Services. And they compared the alerts that the system would have triggered with sepsis treatments logged by staff, who did not see Epic sepsis alerts for patients included in the study.
The researchers say their results suggest Epic’s system wouldn’t make a hospital much better at catching sepsis and could burden staff with unnecessary alerts. The company’s algorithm did not identify two-thirds of the roughly 2,500 sepsis cases in the Michigan data. It would have alerted for 183 patients who developed sepsis but had not been given timely treatment by staff.