Consent-Centric Data Challenges for AI Development in India

AI at a Crossroads: Navigating Consent-Centric Data in India

In the purview of Artificial Intelligence (AI), data is a key driving force for training advanced AI models. Advanced AI systems such as Large Language Models (LLMs) thrive on large volumes of high-quality datasets. However, the Indian Digital Personal Data Protection (DPDP) Act and its rules based on express, informed, and continuous consent pose ethical and practical considerations. This article aims to identify the implications of the DPDP Act’s consent-centric nature on AI development, especially in sectors that require curated, proprietary data.

The Consent-Centric Data Governance

India’s DPDP Act represents a significant milestone in the country’s data protection approach. The official DPDP Rules emphasize that every data point shall be collected according to the principle of consent by the data subject. It also excludes publicly available data in some instances. Unlike the European Union’s (EU) General Data Protection Regulation (GDPR) and Brazil’s Lei Geral de Proteção de Dados (LGPD), this framework narrowly recognises only consent as a valid processing basis, overlooking alternative legal mechanisms such as contractual necessity and legitimate interests that provide processing flexibility under leading international data protection regimes.

With rapidly unfolding AI development, consent as the basis for data protection is working at cross purposes with the prevailing mode of data collection to train large AI models. While the DPDP Act aims to protect an individual’s rights in a way that makes data collection practices more transparent and accountable, this regulatory development comes at a time when AI developers increasingly need data that is not easily accessible to the public. Ernst & Young’s (EY) comprehensive reports on sectoral AI development highlight the essentiality of high-quality, carefully curated datasets for effective LLM training. This finding is further corroborated by analyses of the specific challenges involved in developing generative models. The focus on explicit, granular consent poses a significant conundrum in such a context. How can the underlying consent-centric framework for data protection be reconciled with data requirements for AI innovation?

The Conundrum of Curated Data for Sector-Specific AI

The foundation of AI systems like LLMs rests entirely on their training data. In critical sectors such as healthcare, banking, and online advertising, data collection follows regulated protocols, often drawing from exclusive sources inaccessible to the general public. Within the DPDP Act framework, a consent manager is defined as an entity officially registered with the Data Protection Board of India. It provides a transparent, accessible, and interoperable platform that empowers data principals to grant, manage, review, and revoke their consent and serve as the primary intermediary between individuals and businesses.

However, this consent-based approach creates a fundamental tension in AI development. Requiring case-by-case consent significantly reduces the volume of available training data, creating a complex challenge with multiple dimensions. While consent-centric frameworks aim to build trust and ensure data subjects maintain control, they also introduce new problems for AI innovation. For instance, an additional layer of complexity arises at the intersection of data protection and copyright law. Recent cases highlight the legal issues that develop when curated data protected by copyrights is used to train AI models.

Given that LLMs need vast datasets to function, whether they can negotiate consent for each data element, including copyrighted content, is an enduring question. The tension between the need for comprehensive datasets and the stringent requirements for consent illustrates the challenges that lie ahead for AI developers in India.

Global Perspectives on Privacy and Innovation

Outside the Indian context, similar findings give a different perspective to the balance between privacy and innovation. Reports indicate that applying consent models in the current environment where AI is data-intensive may be difficult. Arguments have been made that privacy cannot remain acceptable only based on individual consent as presupposed by previous frameworks. The conflict between data protection and data utility has been highlighted in various studies, substantiating that even though the DPDP Act is ethical in its consent-centric approach, the lack of sufficient flexibility in its implementation may hinder technological advancement.

A flexible framework, as envisioned in best practices like those embodied in the EU AI Act, outlines responsible data governance and management practices and alternative anonymisation techniques that factor in context in addition to identifiers. These approaches highlight the importance of contextualising privacy protection within comprehensive risk assessment frameworks, vulnerabilities arising from data linkages, and the consequent risk interfaces create.

Balancing Innovation with Ethical Imperatives

The issue, therefore, is to find the balance between two opposite and equally important goals. On one hand, the ethical and legal positions aim to protect an individual’s privacy by ensuring that they consent knowingly and can withdraw their consent at any time. On the other hand, there are technological demands for big and organised data sets for AI development. The further development of AI in India depends on the availability of structured data, which can be accessed through specific mechanisms. However, these mechanisms must conform to the DPDP Act.

Technological solutions, such as Consent Managers, help in consent management more efficiently and manually while maintaining proper records and audit trails, but they add an extra layer of compliance. Blockchain technology can also be used to make the records of consent unalterable and transparent. When used together with methods such as subjective anonymisation, data analyses can help protect individual identities. These tools create a data environment that respects the subject’s rights and the development of AI technologies.

However, policy adaptations are also crucial. Standardised consent templates fail to reduce consent fatigue among the subjects and the researchers. There might be a need to allow sector-specific exemptions and regulatory sandboxes owing to the nature of business conducted in some industries that require curated data. Regulations could permit limited data sharing with the necessary conditions put in place to protect individuals’ privacy and consent while at the same time providing LLMs and other AI systems with the quality data they need.

Conclusion

India’s consent-based data protection regime, while protecting individual rights through informed consent mechanisms, might create operational challenges for AI innovation. The balance between privacy protection and technological innovation will depend on identifying effective solutions like responsive risk-based regulatory frameworks, including sandboxes and exemptions, and moving towards industry-led methods. This will help policymakers and industry leaders collaboratively design an ethical framework conducive to AI-driven progress, ensuring that India remains at the forefront of responsible technological evolution.

More Insights

Congress’s Silent Strike Against AI Regulation

A provision in Congress's budget bill could preempt all state regulation of AI for the next ten years, effectively removing public recourse against AI-related harm. This measure threatens the progress...

Congress Moves to Limit California’s AI Protections

House Republicans are advancing legislation that would impose a 10-year ban on state regulations regarding artificial intelligence, alarming California leaders who fear it would undermine existing...

AI Missteps and National Identity: Lessons from Malaysia’s Flag Controversies

Recent incidents involving AI-generated misrepresentations of Malaysia’s national flag highlight the urgent need for better digital governance and AI literacy. The failures in recognizing national...

Responsible AI: Insights from the Global Trust Maturity Survey

The rapid growth of generative AI and large language models is driving adoption across various business functions, necessitating the deployment of AI in a safe and responsible manner. A recent...

Driving Responsible AI: The Business Case for Ethical Innovation

Philosophical principles and regulatory frameworks have often dominated discussions on AI ethics, failing to resonate with key decision-makers. This article identifies three primary drivers—top-down...

Streamlining AI Regulations for Competitive Advantage in Europe

The General Data Protection Regulation (GDPR) complicates the necessary use of data and AI, hindering companies from leveraging AI's potential effectively. To enhance European competitiveness, there...

Colorado’s AI Act: Legislative Setback and Compliance Challenges Ahead

The Colorado Legislature recently failed to amend the Artificial Intelligence Act, originally passed in 2024, which imposes strict regulations on high-risk AI systems. Proposed amendments aimed to...

AI in Recruitment: Balancing Innovation and Compliance

AI is revolutionizing recruitment by streamlining processes such as resume screening and candidate engagement, but it also raises concerns about bias and compliance with regulations. While the EU has...

EU Member States Struggle to Fund AI Act Enforcement

EU policy adviser Kai Zenner has warned that many EU member states are facing financial difficulties and a shortage of expertise necessary to enforce the AI Act effectively. As the phased...