In public, Facebook seems to claim that it removes more than 90 percent of hate speech on its platform, but in private internal communications the company says the figure is only an atrocious 3 to 5 percent. Facebook wants us to believe that almost all hate speech is taken down, when in reality almost all of it remains on the platform.
This obscene hypocrisy was revealed amid the numerous complaints, based on thousands of pages of leaked internal documents, which Facebook employee-turned-whistleblower Frances Haugen and her legal team filed to the SEC earlier this month. While public attention on these leaks has focused on Instagram’s impact on teen health (which is hardly the smoking gun it’s been touted as) and on the News Feed algorithm’s role in amplifying misinformation (hardly a revelation), Facebook’s utter failure to limit hate speech and the simple deceptive trick it’s consistently relied on to hide this failure is shocking. It exposes just how much Facebook relies on AI for content moderation, just how ineffective that AI is, and the necessity to force Facebook to come clean.
In testimony to the US Senate in October 2020, Mark Zuckerberg pointed to the company’s transparency reports, which he said show that “we are proactively identifying, I think it’s about 94 percent of the hate speech we ended up taking down.” In testimony to the House a few months later, Zuckerberg similarly responded to questions about hate speech by citing a transparency report: “We also removed about 12 million pieces of content in Groups for violating our policies on hate speech, 87 percent of which we found proactively.” In nearly every quarterly transparency report, Facebook proclaims hate speech moderation percentages in the 80s and 90s like these. Yet a leaked a document from March 2021 says, “We may action as little as 3-5% of hate … on Facebook.”
Was Facebook really caught in an egregious lie? Yes and no. Technically, both numbers are correct—they just measure different things. The measure that really matters is the one Facebook has been hiding. The measure Facebook has been reporting publicly is irrelevant. It’s a bit like if every time a police officer pulled you over and asked how fast you were going, you always responded by ignoring the question and instead bragged about your car’s gas mileage.
There are two ways that hate speech can be flagged for review and possible removal. Users can report it manually, or AI algorithms can try to detect it automatically. Algorithmic detection is important not just because it’s more efficient, but also because it can be done proactively, before any users flag the hate speech.
The 94 percent number that Facebook has publicly touted is the “proactive rate,” the number of hate speech items taken down that Facebook’s AI detected proactively, divided by the total number of hate speech items taken down. Facebook probably wants you to think this number conveys how much hate speech is taken down before it has an opportunity to cause harm—but all it really measures is how big a role algorithms play in hate-speech detection on the platform.
What matters to society is the amount of hate speech that is not removed from the platform. The best way to capture this is the number of hate-speech takedowns divided by the total number of hate speech instances. This “takedown rate” measures how much hate speech on Facebook is actually taken down—and it’s the number that Facebook tried to keep secret.
Thanks to Haugen, we finally know the takedown rate, and it is dismal. According to internal documents, more than 95 percent of hate speech shared on Facebook stays on Facebook. Zuckerberg boasted to Congress that Facebook took down 12 million pieces of hate speech in Groups, but based on the leaked estimate, we now know that around 250 million pieces of hate speech were likely left up. This is staggering, and it shows how little progress has been made since the early days of unregulated internet forums—despite the extensive investments Facebook has made in AI content moderation over the years.