From Computer Science to Social Science: New Approaches to Track and Mitigate Online Hate Speech
By Michael Yoder, Collaboratory Against Hate Postdoctoral Researcher
Hate and extremism increasingly spread through digital communication: social media, online forums, articles, and video livestreams. Tracking and mitigating hate and extremism now requires handling this messaging at an enormous scale. Tech companies and computer science researchers have developed automated systems for identifying hate speech and extremist content, but these have many shortcomings. Classifiers built for detecting broad notions of "hate speech" fail to pick up on the nuances of specific narratives and ideologies, such as white supremacy. Implicit, novel language and visual content often slips through, while classifiers incorrectly flag mentions of marginalized identity terms in positive contexts for censure.
Supported by the Collaboratory Against Hate and my academic advisors, Prof. Kathleen M. Carley and Prof. David West Brown, I’ve been researching new ways of detecting hate and extremism in large datasets of online interaction and communication. This work requires moving beyond technical methods to incorporate what is known about how social movements (including hateful ones) circulate through language and visual modalities. Computer scientists have much to learn from fields such as linguistics, communication, sociology, and area studies.
One key insight from these areas has been how a concept like “hate speech” contains a variety of related, nuanced phenomena. In one project (paper link here), we showed how the language of hate speech varies significantly depending on the identity of those targeted by it. Most automated hate speech classifiers ignore this variation, which we demonstrated can significantly impact how well they perform. Classifiers trained on anti-Black racism, for example, will fail to identify anti-LGBTQ+ hate speech.
Hate speech also varies by the ideologies of those producing it. Practitioners have called for systems that detect particular types of extremism (see our CAH white paper). We developed an automated classifier to detect white supremacist language, trained on data collected across online domains, social media, and forums known for white supremacist extremism. Knowing that such a classifier would be prone to flag any mention of marginalized identities as hateful, we found that incorporating text from anti-racist perspectives as counterexamples mitigated this bias. We plan on applying this classifier to trace the movement of white supremacist language in networks to see how communities develop in mainstream and fringe platforms around such ideologies.
Understanding the spread of hateful ideologies requires analysis of communication strategies at a large scale. With Collaboratory Against Hate summer interns and faculty experts in communication and antisemitism, we uncovered prominent messaging narratives linking COVID-19 sentiment with antisemitism on Twitter (white paper here). We are now studying how the public-facing offline propaganda of white supremacist groups has roots in online spaces.