Addressing AI-Driven Online Threats with Safety by Design

A Safety by Design Governance Approach to Addressing AI-Facilitated Online Harms

The rapid growth of artificial intelligence (AI) is reshaping the digital landscape, amplifying existing online harms and introducing new and complex online safety risks. Generative AI now allows individuals using consumer devices with no technical expertise to create synthetic content that is indistinguishable from authentic content. A report by Netsafe highlights the emerging online safety implications of AI, pointing to the rise of deepfakes as a clear example. Deepfakes are being used to spread falsehoods, create sexualized content, and facilitate personalized communications that target audiences with malicious messages at scale.

Addressing the growing sophistication and volume of AI-facilitated online harms requires a Safety by Design governance approach that targets different stages of the online harm lifecycle and considers both technological and human dimensions of online safety.

Understanding Deepfakes and Their Impact

Deepfakes are increasingly common tools for malicious actors to inflict online harm and distort the information landscape. In the first quarter of 2025 alone, deepfake scams impersonating public figures and private individuals caused global losses of US$200 million. These malicious uses of deepfakes include creating pornographic content that disproportionately targets women, youth, and children, facilitating incidents of sextortion and grooming, and contributing to severe emotional distress.

AI-generated content is also being utilized to bolster disinformation and misinformation campaigns aimed at influencing public opinion and electoral outcomes. For example, concerns about foreign interference involving AI-driven disinformation led to the nullification of the 2024 presidential election in Romania. Furthermore, deepfakes are being used in radicalization propaganda, including AI-generated audio of public figures promoting extremist ideologies.

The Growing Distrust in Digital Media

The harmful use of deepfakes fuels growing public distrust of digital and news media, emboldening bad actors to delegitimize and discredit authentic content. This phenomenon, known as the “liar’s dividend,” allows genuine content to be questioned as AI-generated, which has been observed in discussions surrounding global conflicts.

A Systems-Level Response to Online Harm Lifecycle

AI governance frameworks often focus on developers and deployers of AI systems, emphasizing their risk mitigation obligations. However, online harms are often driven by the irresponsible misuse of technology and end-user behaviors. A holistic Safety by Design governance approach is necessary to address the wide range of AI-enabled online safety harms, establishing interventions at different stages of the harm lifecycle.

This can be framed in terms of a “Prepare-Curb-Respond” framework:

Prepare: Reduce susceptibility to online harms through digital media literacy efforts.
Curb: Implement preventive interventions to limit the creation and spread of harmful digital content.
Respond: Employ reactive measures aimed at remediating the effects of harmful content.

International Cooperation for Effective Governance

While domestic policy interventions are crucial, coordinated international governance is also necessary to contend with the transnational nature of AI-enabled online harms. Forums such as the Global Online Safety Regulators Network and the International Network of AI Safety Institutes can enable global collaboration at the intersection of AI and online safety to support knowledge sharing on effective policy responses and emerging harms.

Regional Perspectives and Initiatives

In regions undergoing major digital transformation, there is a critical window of opportunity to apply the “Prepare-Curb-Respond” framework while AI-facilitated online harms are emerging. For example, Fiji’s approach to tackling AI-facilitated online safety harms sets a model for the Pacific. The country’s National Digital Strategy aims to create a safer digital environment through significant reductions in online scams, cyberbullying, and harmful content.

To prepare the public against online harms, Fiji has distributed over 60,000 booklets on online safety and supports community-based cybersecurity efforts. A dedicated scam task force has been established to curb the rise of personalized AI scams, while efforts are being made to improve remedial avenues for victims of online harms.

Meanwhile, in Southeast Asia, significant multilateral efforts are underway to address the cross-border nature of AI-enabled online harms. The Association of Southeast Asian Nations (ASEAN) has published guidance on AI governance and ethics, outlining policy recommendations for addressing the risks of generative AI.

Conclusion: Cultivating a Safer Digital World

To cultivate a safer digital world, governance frameworks must adapt to the constant evolution of AI. Recognizing that technical safeguards directed at AI models are insufficient is crucial. There are significant opportunities for innovation in measures that address human-centered concerns, ultimately allowing AI to be harnessed more effectively to combat online harms.