Key GDPR Considerations for Using Patient Data in AI-Driven Research

Re-use of Patient Data in Scientific Research to Train AI Systems: Important GDPR Considerations

The integration of artificial intelligence (AI) into the realm of scientific research has opened up new avenues for innovation, particularly in the identification of potential new medicines for clinical trials. However, the use of patient data to train these AI systems raises significant legal and ethical questions, particularly in light of the General Data Protection Regulation (GDPR).

Understanding the Applicability of GDPR

In instances where an organization develops its AI system internally without utilizing patient data, the AI Act does not apply, allowing for a research exemption. This exemption arises because no personal data is processed in such scenarios. However, the landscape shifts dramatically when patient data is involved. Under these circumstances, the GDPR becomes applicable, requiring organizations to navigate a complex web of regulations.

Patient data can be sourced from various avenues, including:

Healthcare records collected by institutions, such as the NHS
Voluntary registries where patients consent to share their data for research
Data obtained from prior clinical trials

It is crucial to note that these datasets were originally collected for specific purposes, necessitating a thorough assessment to determine their compatibility for repurposing in AI training.

Assessing Compatibility for Secondary Use of Patient Data

The GDPR enshrines a purpose limitation principle, mandating that personal data must be collected for a specific, explicit, and legitimate purpose. If the data is to be reused for a different objective, a compatibility assessment is required. Several factors influence this assessment:

Link between the initial and secondary purposes: The closer the connection, the more likely it is to be deemed compatible.
The context and the reasonable expectations of data subjects: If data subjects were informed at the time of collection about potential secondary uses, reuse is more likely to be justified.
Nature of the data: The more sensitive the personal data (e.g., health data), the narrower the scope for compatibility.
Consequences for data subjects: Both positive and negative consequences must be evaluated.
Existence of appropriate safeguards: Measures like encryption, pseudonymization, transparency, and opt-out options should be considered.

The purpose limitation principle aims to maintain individuals’ control over their data and prevent unauthorized repurposing. Scientific research is generally considered a compatible secondary use, provided that appropriate safeguards are in place.

Scientific Research and GDPR Compliance

While the GDPR does not explicitly define ‘scientific research’, Recital 159 suggests a broad interpretation, encompassing technological development, fundamental and applied research, and both privately and publicly funded studies. The European Data Protection Board (EDPB) advises that scientific research must adhere to established ethical and methodological standards. Both the discovery phase and clinical research typically follow strict methods or protocols and thus should qualify as scientific research.

Despite plans for the EDPB to issue guidance in 2021 on defining scientific research and appropriate safeguards, such clarity has not yet materialized. Therefore, organizations should not assume automatic compatibility but instead conduct a thorough compatibility assessment.

Compatibility Assessment Outcomes

If the secondary use is found to be incompatible with the initial collection, the data cannot be reused for the secondary purpose unless:

Such processing is based on the explicit consent of the data subject.
A Union or Member State law safeguards important objectives of general public interest (e.g., public health).

If the secondary use is compatible with the initial collection, organizations may rely on the legal basis used for the original data collection. Nonetheless, all other data protection principles must still be respected, including informing data subjects about further processing and their rights, as well as conducting a data processing impact assessment if necessary. The compatibility assessment and adopted measures must be documented to fulfill the accountability principle.

Exceptions Under Article 9 of the GDPR for Processing Health Data

Even when a secondary use is considered compatible, an additional exception is required under Article 9 of the GDPR for processing health data. Potential exceptions include:

Explicit consent (art. 9(2)(a)).
Reasons of public interest in the area of public health based on Union or Member State law (art. 9(2)(i)).
Scientific research based on Union or Member State law with the adoption of appropriate safeguards (art. 9(2)(j)).

Conclusion

The use of patient data for developing or training AI systems in scientific research necessitates careful regulatory consideration. When patient data is involved, compliance with the GDPR becomes paramount. Organizations must assess the compatibility of secondary data use while adhering to all GDPR principles and obligations to ensure ethical and legal integrity in their research practices.