Research and advocacy of progressive and pragmatic policy ideas.
Managing online hate effectively and efficiently requires both technology and human moderators. We explore the issues and introduce our new hate speech project for Malaysia.
By Jia Vern Tham & Khairil Ahmad16 July 2021
The Atlanta shooting that killed six Asian women earlier this year took place on the back of rising anti-Asian online content in the US. Hateful slurs blaming Asians for the outbreak of COVID-19, along with degrading caricatures and memes of the community, had reportedly surged in far-right social media forums in the months leading up to the attack.
Online hate speech or hateful content has become a major concern in societies around the world. There is a growing trend where audiences convene around such content in online spaces, and develop or strengthen biases against other groups. The risks of online hate speech leading to violence has been well researched, but there are also broader risks to inter-group trust and cohesion.
There is now more attention than ever among governments and civil societies on finding ways to counter hate speech. At the time of writing, UK Prime Minister Boris Johnson had just issued a warning to social media giants to remove racial abuse from their platforms or face steep fines.
Driven by dramatic events, hate speech detection and removal became an urgent issue for social media companies relatively recently, with more companies implementing changes to relevant policies and targeted actions. Facebook and Twitter, for example, updated policies to describe the varying severities of hate speech disallowed on their respective platforms. These companies, alongside Snapchat, have also imposed sanctions against Donald Trump for his role in inciting the Capitol Hill riot.
We laud these developments. For Malaysia, we have also argued for an effective response to hateful content, backed by a localised understanding of what constitutes hate speech in our society’s context. Following a pioneering study which we published last year, we also proposed a cohesive framework for defining, categorising and responding to the range of hate speech intensity in the country.
But how do we define, categorise and respond to something that we do not gather? Efforts to manage online hateful content of varying seriousness needs to start with consistent detection of its trends on social media, which can then feed into further work on determining its seriousness, impact and appropriate response.
Detecting online hate speech is usually a labour-intensive endeavour, relying on human moderators – whether volunteers or paid workers – to process and filter large volumes of online content. There are obvious issues with this, not least the sheer volume of new online content generated on a daily basis. Relying on human moderators also requires investment in a sufficiently diverse team of humans with local knowledge, which not many companies or regulators would be willing to do. Moreover, as their job involves reviewing unsavoury content, moderation can be a distressing experience for human moderators.
To overcome those issues, and with COVID-19 sweeping across the world*, social media platforms have been looking into increasing automation in moderation processes. Last year, Facebook and YouTube moved towards prioritising AI over human moderation, but there are some challenges to iron out.
*Human moderators, whose work is normally carried out within company facilities due to the sensitive nature of the content that they monitor, were sent home as a result of the pandemic, forcing the tech giants to rely instead on algorithms.
The AI models are, as yet, far from perfect. In a study by the University of Oxford and the Alan Turing Institute published last month, some systems were found to be overly lenient (failing to detect variations of hate speech), while others were overly tough (detecting non-hateful posts mentioning hateful words).
These issues stem from the different ways an AI model is trained to detect hateful content. One approach is by training a machine learning model with given keywords and/or simple templates of expressions before applying it to online content. Some systems that were trained with only keywords would inevitably disregard the intricacies of language. Many expressions may not be inherently offensive, but can be so in the right context and phrasing. And of course, the keywords and templates put together in the training set for the AI are also limited by the diversity of the humans behind its design.
Current AI-driven detection, therefore, still comes with problems and there will still be a role for human moderators. It is crucial to work on improving AI models though since relying on human moderation on its own will not be effective, efficient or even humane. Continued improvement of automated online hate speech detection will require not only better technology but also better design.
For an effective partnership between AI and humans, the quality of data being fed into AI is important; when the AI is trained with local contexts, language, and dialect, it can identify or detect hate speech more accurately. Certain organisations that offer detection services for others have been doing this; PeaceTech Lab and Hatebase, for example, rely on NGOs and experts to detect what is considered potentially dangerous in local contexts though the work is prioritised on conflict zones.
There has yet to be any large-scale hate speech detection initiative in Malaysia, a country where highly racial and religious views at varying levels of hate are rife. In cases where hate speech occurs, the burden either falls on victims to escalate their concerns, or on authorities to interpret punitive legislation with overly-broad definitions of hate speech.
Building on our research last year, The Centre is working on a hate speech detection initiative in Malaysia by pairing AI with human moderators who are familiar with the local context. This initiative, a proof-of-concept hate speech tracker, will act as a repository of trends and themes for online hateful content in the country.
The tracker will become a useful resource centre for the general public and policymakers. For the public, it may serve as a means to better understand the degree and nature of intergroup tensions. For policymakers, it may be useful as a means to monitor how terms and themes that are potentially dangerous towards groups of people evolve. This could help policymakers identify more effective mechanisms to manage online hate speech and update current policy responses.
Detecting online hate speech, however, should not come at the expense of free speech. Critics of online content moderation have rightfully pointed out that efforts to manage hate speech can potentially be abused to silence dissenting views. Some hate speech monitoring organisations have tried to address this concern. Hatebase, for instance, tracks hate speech trends but at the same time is against government censorship of speech except for clear incitement to violence.
As an organisation, we also value the importance of upholding free speech in Malaysia while balancing it with the responsibilities of living in a multiracial society. To uphold this principle, content identified as hateful would need to meet a published threshold of harm or seriousness before getting flagged, a topic which we researched last year by gathering Malaysians’ perceptions on various speech samples’ intensity. We will also not be signposting or showcasing individual tweets, but only present trends and themes.
There has yet to be a perfect tool or system that can reliably detect online hate speech in the world. Initiatives to detect online hate speech need to be complemented with continuous learning of local context to catch up with the ever-evolving social, cultural and political environment.
And yes, detecting online hate speech is only the first step. Understanding the root causes of different forms of hate in a society is also vital in managing potential inter-group conflict. With detection and better understanding, appropriate action can be taken to mitigate the impact of hate speech in various spheres of society — be it in chat groups, comment sections, or real-life conversations between people.
The Centre is a centrist think tank driven by research and advocacy of progressive and pragmatic policy ideas. We are a not-for-profit and a mostly remote working organisation.