As more and more problems with AI have surfaced, including biases around race, gender, and age, many tech companies have installed “ethical AI” teams ostensibly dedicated to identifying and mitigating such issues.
Twitter’s META unit was more progressive than most in publishing details of problems with the company’s AI systems, and in allowing outside researchers to probe its algorithms for new issues.
Last year, after Twitter users noticed that a photo-cropping algorithm seemed to favor white faces when choosing how to trim images, Twitter took the unusual decision to let its META unit publish details of the bias it uncovered. The group also launched one of the first ever “bias bounty” contests, which let outside researchers test the algorithm for other problems. Last October, Chowdhury’s team also published details of unintentional political bias on Twitter, showing how right-leaning news sources were, in fact, promoted more than left-leaning ones.
Many outside researchers saw the layoffs as a blow, not just for Twitter but for efforts to improve AI. “What a tragedy,” Kate Starbird, an associate professor at the University of Washington who studies online disinformation, wrote on Twitter.
“The META team was one of the only good case studies of a tech company running an AI ethics group that interacts with the public and academia with substantial credibility,” says Ali Alkhatib, director of the Center for Applied Data Ethics at the University of San Francisco.
Alkhatib says Chowdhury is incredibly well thought of within the AI ethics community and her team did genuinely valuable work holding Big Tech to account. “There aren’t many corporate ethics teams worth taking seriously,” he says. “This was one of the ones whose work I taught in classes.”
Mark Riedl, a professor studying AI at Georgia Tech, says the algorithms that Twitter and other social media giants use have a huge impact on people’s lives, and need to be studied. “Whether META had any impact inside Twitter is hard to discern from the outside, but the promise was there,” he says.
Riedl adds that letting outsiders probe Twitter’s algorithms was an important step toward more transparency and understanding of issues around AI. “They were becoming a watchdog that could help the rest of us understand how AI was affecting us,” he says. “The researchers at META had outstanding credentials with long histories of studying AI for social good.”
As for Musk’s idea of open-sourcing the Twitter algorithm, the reality would be far more complicated. There are many different algorithms that affect the way information is surfaced, and it’s challenging to understand them without the real time data they are being fed in terms of tweets, views, and likes.
The idea that there is one algorithm with explicit political leaning might oversimplify a system that can harbor more insidious biases and problems. Uncovering these is precisely the kind of work that Twitter’s META group was doing. “There aren’t many groups that rigorously study their own algorithms’ biases and errors,” says Alkhatib at the University of San Francisco. “META did that.” And now, it doesn’t.
In one example of the IC’s successful use of AI, after exhausting all other avenues—from human spies to signals intelligence—the US was able to find an unidentified WMD research and development facility in a large Asian country by locating a bus that traveled between it and other known facilities. To do that, analysts employed algorithms to search and evaluate images of nearly every square inch of the country, according to a senior US intelligence official who spoke on background with the understanding of not being named.
While AI can calculate, retrieve, and employ programming that performs limited rational analyses, it lacks the calculus to properly dissect more emotional or unconscious components of human intelligence that are described by psychologists as system 1 thinking.
AI, for example, can draft intelligence reports that are akin to newspaper articles about baseball, which contain structured non-logical flow and repetitive content elements. However, when briefs require complexity of reasoning or logical arguments that justify or demonstrate conclusions, AI has been found lacking. When the intelligence community tested the capability, the intelligence official says, the product looked like an intelligence brief but was otherwise nonsensical.
Such algorithmic processes can be made to overlap, adding layers of complexity to computational reasoning, but even then those algorithms can’t interpret context as well as humans, especially when it comes to language, like hate speech.
AI’s comprehension might be more analogous to the comprehension of a human toddler, says Eric Curwin, chief technology officer at Pyrra Technologies, which identifies virtual threats to clients from violence to disinformation. “For example, AI can understand the basics of human language, but foundational models don’t have the latent or contextual knowledge to accomplish specific tasks,” Curwin says.
“From an analytic perspective, AI has a difficult time interpreting intent,” Curwin adds. “Computer science is a valuable and important field, but it is social computational scientists that are taking the big leaps in enabling machines to interpret, understand, and predict behavior.”
In order to “build models that can begin to replace human intuition or cognition,” Curwin explains, “researchers must first understand how to interpret behavior and translate that behavior into something AI can learn.”
Although machine learning and big data analytics provide predictive analysis about what might or will likely happen, it can’t explain to analysts how or why it arrived at those conclusions. The opaqueness in AI reasoning and the difficulty vetting sources, which consist of extremely large data sets, can impact the actual or perceived soundness and transparency of those conclusions.
Transparency in reasoning and sourcing are requirements for the analytical tradecraft standards of products produced by and for the intelligence community. Analytic objectivity is also statuatorically required, sparking calls within the US government to update such standards and laws in light of AI’s increasing prevalence.
Machine learning and algorithms when employed for predictive judgments are also considered by some intelligence practitioners as more art than science. That is, they are prone to biases, noise, and can be accompanied by methodologies that are not sound and lead to errors similar to those found in the criminal forensic sciences and arts.
Another week, another privacy horror show: Crisis Text Line, a nonprofit text message service for people experiencing serious mental health crises, has been using “anonymized” conversation data to power a for-profit machine learning tool for customer support teams. (After backlash, CTL announced it would stop.) Crisis Text Line’s response to the backlash focused on the data itself and whether it included personally identifiable information. But that response uses data as a distraction. Imagine this: Say you texted Crisis Text Line and got back a message that said “Hey, just so you know, we’ll use this conversation to help our for-profit subsidiary build a tool for companies who do customer support.” Would you keep texting?
That’s the real travesty—when the price of obtaining mental health help in a crisis is becoming grist for the profit mill. And it’s not just users of CTL who pay; it’s everyone who goes looking for help when they need it most.
Americans need help and can’t get it. The huge unmet demand for critical advice and help has given rise to a new class of organizations and software tools that exist in a regulatory gray area. They help people with bankruptcy or evictions, but they aren’t lawyers; they help people with mental health crises, but they aren’t care providers. They invite ordinary people to rely on them and often do provide real help. But these services can also avoid taking responsibility for their advice, or even abuse the trust people have put in them. They can make mistakes, push predatory advertising and disinformation, or just outright sell data. And the consumer safeguards that would normally protect people from malfeasance or mistakes by lawyers or doctors haven’t caught up.
This regulatory gray area can also constrain organizations that have novel solutions to offer. Take Upsolve, a nonprofit that develops software to guide people through bankruptcy. (The organization takes pains to claim it does not offer legal advice.) Upsolve wants to train New York community leaders to help others navigate the city’s notorious debt courts. One problem: These would-be trainees aren’t lawyers, so under New York (and nearly every other state) law, Upsolve’s initiative would be illegal. Upsolve is now suing to carve out an exception for itself. The company claims, quite rightly, that a lack of legal help means people effectively lack rights under the law.
The legal profession’s failure to grant Americans access to support is well-documented. But Upsolve’s lawsuit also raises new, important questions. Who is ultimately responsible for the advice given under a program like this, and who is responsible for a mistake—a trainee, a trainer, both? How do we teach people about their rights as a client of this service, and how to seek recourse? These are eminently answerable questions. There are lots of policy tools for creating relationships with elevated responsibilities: We could assign advice-givers a special legal status, establish a duty of loyalty for organizations that handle sensitive data, or create policy sandboxes to test and learn from new models for delivering advice.
But instead of using these tools, most regulators seem content to bury their heads in the sand. Officially, you can’t give legal advice or health advice without a professional credential. Unofficially, people can get such advice in all but name from tools and organizations operating in the margins. And while credentials can be important, regulators are failing to engage with the ways software has fundamentally changed how we give advice and care for one another, and what that means for the responsibilities of advice-givers.
And we need that engagement more than ever. People who seek help from experts or caregivers are vulnerable. They may not be able to distinguish a good service from a bad one. They don’t have time to parse terms of service dense with jargon, caveats, and disclaimers. And they have little to no negotiating power to set better terms, especially when they’re reaching out mid-crisis. That’s why the fiduciary duties that lawyers and doctors have are so necessary in the first place: not just to protect a person seeking help once, but to give people confidence that they can seek help from experts for the most critical, sensitive issues they face. In other words, a lawyer’s duty to their client isn’t just to protect that client from that particular lawyer; it’s to protect society’s trust in lawyers.
And that’s the true harm—when people won’t contact a suicide hotline because they don’t trust that the hotline has their sole interest at heart. That distrust can be contagious: Crisis Text Line’s actions might not just stop people from using Crisis Text Line. It might stop people from using any similar service. What’s worse than not being able to find help? Not being able to trust it.
The character of conflict between nations has fundamentally changed. Governments and militaries now fight on our behalf in the “gray zone,” where the boundaries between peace and war are blurred. They must navigate a complex web of ambiguous and deeply interconnected challenges, ranging from political destabilization and disinformation campaigns to cyberattacks, assassinations, proxy operations, election meddling, or perhaps even human-made pandemics. Add to this list the existential threat of climate change (and its geopolitical ramifications) and it is clear that the description of what now constitutes a national security issue has broadened, each crisis straining or degrading the fabric of national resilience.
Traditional analysis tools are poorly equipped to predict and respond to these blurred and intertwined threats. Instead, in 2022 governments and militaries will use sophisticated and credible real-life simulations, putting software at the heart of their decision-making and operating processes. The UK Ministry of Defence, for example, is developing what it calls a military Digital Backbone. This will incorporate cloud computing, modern networks, and a new transformative capability called a Single Synthetic Environment, or SSE.
This SSE will combine artificial intelligence, machine learning, computational modeling, and modern distributed systems with trusted data sets from multiple sources to support detailed, credible simulations of the real world. This data will be owned by critical institutions, but will also be sourced via an ecosystem of trusted partners, such as the Alan Turing Institute.
An SSE offers a multilayered simulation of a city, region, or country, including high-quality mapping and information about critical national infrastructure, such as power, water, transport networks, and telecommunications. This can then be overlaid with other information, such as smart-city data, information about military deployment, or data gleaned from social listening. From this, models can be constructed that give a rich, detailed picture of how a region or city might react to a given event: a disaster, epidemic, or cyberattack or a combination of such events organized by state enemies.
Defense synthetics are not a new concept. However, previous solutions have been built in a standalone way that limits reuse, longevity, choice, and—crucially—the speed of insight needed to effectively counteract gray-zone threats.
National security officials will be able to use SSEs to identify threats early, understand them better, explore their response options, and analyze the likely consequences of different actions. They will even be able to use them to train, rehearse, and implement their plans. By running thousands of simulated futures, senior leaders will be able to grapple with complex questions, refining policies and complex plans in a virtual world before implementing them in the real one.
One key question that will only grow in importance in 2022 is how countries can best secure their populations and supply chains against dramatic weather events coming from climate change. SSEs will be able to help answer this by pulling together regional infrastructure, networks, roads, and population data, with meteorological models to see how and when events might unfold.
Fears of Artificial intelligence fill the news: job losses, inequality, discrimination, misinformation, or even a superintelligence dominating the world. The one group everyone assumes will benefit is business, but the data seems to disagree. Amid all the hype, US businesses have been slow in adopting the most advanced AI technologies, and there is little evidence that such technologies are contributing significantly to productivity growth or job creation.
This disappointing performance is not merely due to the relative immaturity of AI technology. It also comes from a fundamental mismatch between the needs of business and the way AI is currently being conceived by many in the technology sector—a mismatch that has its origins in Alan Turing’s pathbreaking 1950 “imitation game” paper and the so-called Turing test he proposed therein.
The Turing test defines machine intelligence by imagining a computer program that can so successfully imitate a human in an open-ended text conversation that it isn’t possible to tell whether one is conversing with a machine or a person.
At best, this was only one way of articulating machine intelligence. Turing himself, and other technology pioneers such as Douglas Engelbart and Norbert Wiener, understood that computers would be most useful to business and society when they augmented and complemented human capabilities, not when they competed directly with us. Search engines, spreadsheets, and databases are good examples of such complementary forms of information technology. While their impact on business has been immense, they are not usually referred to as “AI,” and in recent years the success story that they embody has been submerged by a yearning for something more “intelligent.” This yearning is poorly defined, however, and with surprisingly little attempt to develop an alternative vision, it has increasingly come to mean surpassing human performance in tasks such as vision and speech, and in parlor games such as chess and Go. This framing has become dominant both in public discussion and in terms of the capital investment surrounding AI.
Economists and other social scientists emphasize that intelligence arises not only, or even primarily, in individual humans, but most of all in collectives such as firms, markets, educational systems, and cultures. Technology can play two key roles in supporting collective forms of intelligence. First, as emphasized in Douglas Engelbart’s pioneering research in the 1960s and the subsequent emergence of the field of human-computer interaction, technology can enhance the ability of individual humans to participate in collectives, by providing them with information, insights, and interactive tools. Second, technology can create new kinds of collectives. This latter possibility offers the greatest transformative potential. It provides an alternative framing for AI, one with major implications for economic productivity and human welfare.
Businesses succeed at scale when they successfully divide labor internally and bring diverse skill sets into teams that work together to create new products and services. Markets succeed when they bring together diverse sets of participants, facilitating specialization in order to enhance overall productivity and social welfare. This is exactly what Adam Smith understood more than two and a half centuries ago. Translating his message into the current debate, technology should focus on the complementarity game, not the imitation game.
We already have many examples of machines enhancing productivity by performing tasks that are complementary to those performed by humans. These include the massive calculations that underpin the functioning of everything from modern financial markets to logistics, the transmission of high-fidelity images across long distances in the blink of an eye, and the sorting through reams of information to pull out relevant items.
What is new in the current era is that computers can now do more than simply execute lines of code written by a human programmer. Computers are able to learn from data and they can now interact, infer, and intervene in real-world problems, side by side with humans. Instead of viewing this breakthrough as an opportunity to turn machines into silicon versions of human beings, we should focus on how computers can use data and machine learning to create new kinds of markets, new services, and new ways of connecting humans to each other in economically rewarding ways.
An early example of such economics-aware machine learning is provided by recommendation systems, an innovative form of data analysis that came to prominence in the 1990s in consumer-facing companies such as Amazon (“You may also like”) and Netflix (“Top picks for you”). Recommendation systems have since become ubiquitous, and have had a significant impact on productivity. They create value by exploiting the collective wisdom of the crowd to connect individuals to products.
Emerging examples of this new paradigm include the use of machine learning to forge direct connections between musicians and listeners, writers and readers, and game creators and players. Early innovators in this space include Airbnb, Uber, YouTube, and Shopify, and the phrase “creator economy” is being used as the trend gathers steam. A key aspect of such collectives is that they are, in fact, markets—economic value is associated with the links among the participants. Research is needed on how to blend machine learning, economics, and sociology so that these markets are healthy and yield sustainable income for the participants.
Democratic institutions can also be supported and strengthened by this innovative use of machine learning. The digital ministry in Taiwan has harnessed statistical analysis and online participation to scale up the kind of deliberative conversations that lead to effective team decisionmaking in the best managed companies.