Category: ThinkTank

AI’s Shadow: Exposing and Addressing Harms Against Women and Girls

AI’s rapid advancement presents risks, especially for vulnerable populations targeted by cyber-harassment, hate speech, and impersonation. AI systems can amplify biases and be exploited to harm, specifically targeting women and girls. Red Teaming offers a strategic way to test generative AI models, revealing vulnerabilities and potentially harmful behaviors. By empowering diverse communities with accessible testing tools, and advocating for AI social good, we can strive for more equitable AI systems.

July 8, 2025

AI Safeguards: A Step-by-Step Guide to Building Robust Defenses

As AI becomes more powerful, protecting against its misuse is critical. This requires well-designed “safeguards” – technical and procedural interventions to prevent harmful outcomes. Research outlines a structured approach to developing and assessing these safeguards, emphasizing clear requirements, comprehensive planning, robust evidence gathering, and ongoing monitoring. This systematic process helps developers and policymakers build safer and more reliable AI systems.

April 15, 2025

Algorithmic Audits: A Practical Guide to Fairness, Transparency, and Accountability in AI

Algorithmic auditing is crucial for ensuring AI systems are fair, transparent, and accountable. A comprehensive audit should inspect the AI within its operational context, considering the data used and affected individuals. This approach applies to systems used for resource allocation, categorization, and identification in areas like healthcare and finance. Beyond bias, audits should assess social impact, user inclusion, and available recourse. The audit process involves creating model cards, mapping system interactions, identifying bias sources, and conducting bias testing, along with optional adversarial auditing for high-risk systems. Effective audit reports, including internal, public, and periodic versions, are vital for transparency and continuous improvement.

April 10, 2025

AI Explainability: A Practical Guide to Building Trust and Understanding

As AI systems exert increasing influence over our lives, understanding how they arrive at conclusions is paramount. Legal professionals, compliance officers, and policy analysts are grappling with AI’s inner workings, requiring insights into building trustworthy and ethical technologies. Explainability, the degree to which AI rationale is understood by people, offers actionable guidance for achieving true transparency in governance and ethical AI. It requires careful tailoring to various cultural and societal contexts, including child-centric implementations.

April 9, 2025

AI Governance: Transparency, Ethics, and Risk Management in the Age of AI

AI is rapidly transforming society, creating both opportunities and risks. A proposed AI governance framework emphasizes transparency, ethical development, and robust risk management. Key commitments include documenting models, complying with copyright law, and establishing safety frameworks. The framework is guided by EU values, the AI Act itself, proportionality to risk, future-proofing, SME support, ecosystem support, and innovation. For high-risk AI, providers must define and implement safety and security frameworks, document risk assessments, and undergo independent evaluations. Continuous monitoring, adaptation, and collaboration are crucial for responsible AI development. Non-retaliation protections for reporting workers is a key component.

April 8, 2025

AI Ethics Auditing: From Regulatory Push to Building Trustworthy AI

AI systems are increasingly scrutinized for bias and unintended consequences, leading to the rise of AI ethics auditing. This emerging practice aims to evaluate these systems, driven primarily by expected regulations and the need to maintain a positive public image. Though still developing, these audits face challenges including regulatory ambiguity, difficulties in coordinating various expertise, and limited resources. Ultimately, they aim to ensure AI aligns with ethical principles, minimizing potential harm and fostering responsible AI innovation.

April 6, 2025

AI’s Promise and Peril: A Lifecycle Framework for Responsible Innovation

By strategically intervening at key points within the AI lifecycle, we can move towards a future where AI’s immense potential is realized without succumbing to avoidable pitfalls. This structured approach, prioritizing both technical and policy solutions, encourages innovation while proactively addressing risks from model development to user interaction. Ultimately, embracing shared responsibility and continuous monitoring allows us to collaboratively navigate the evolving AI landscape, ensuring its benefits are broadly shared and its harms are effectively minimized.

April 5, 2025

Building Trustworthy AI: A Practical Guide to Risk Mitigation and Compliance

The pursuit of trustworthy and compliant AI is not merely a defensive strategy against regulatory action or public backlash; it’s a proactive path to unlocking unprecedented value and building sustainable competitive advantage. By embracing the outlined strategies, organizations can foster innovation while mitigating risks across the entire AI lifecycle, from initial data handling to long-term model maintenance. This commitment cultivates stronger relationships with customers, attracts top talent, appeals to investors, and, ultimately, ensures that AI serves as a force for progress and stability, rather than a source of unforeseen disruptions.

April 4, 2025

Data Cards: Documenting Data for Transparent, Responsible AI

As AI systems become increasingly prevalent, documenting their data foundation is vital. “Data Cards”—structured summaries of datasets—promote transparency and responsible AI. These cards cover origins, factuals, transformations, and potential limitations, enabling informed decisions, risk mitigation, and equitable models. A collaborative development process and the OFTEn framework (Origins, Factuals, Transformations, Experience) guide their creation, ensuring comparability, intelligibility, and addressing uncertainty. The focus on answering questions focused on telescopes, periscopes, and microscopes, allow for a broad audience to navigate the data based on their needs. Data Cards function as boundary objects between data producers, agents and users while helping organizations meet regulatory demands.

April 3, 2025

Governing AI Risks: Anthropic’s Safety-First Approach

As AI systems become more powerful, managing risks is critical. This research explores strategies for safely governing advanced AI development through careful evaluation, monitoring, and hazard mitigation. Understanding these mechanisms is vital for responsible AI integration into society. Safeguards must outpace capabilities to ensure human safety.

April 1, 2025