On 18 December 2024, the European Data Protection Board (EDPB) issued its highly anticipated Opinion on the use of personal data in developing and deploying AI models. The overarching intention of the EDPB is to foster responsible AI innovation while ensuring that personal data protection adheres fully to GDPR requirements.
The Opinion was requested by the Irish Data Protection Commission (DPC) with the aim of achieving regulatory harmonisation in respect of AI models across Europe. The DPC was particularly focused on the issue of the usage of personal data in large language models (LLMs), (AI systems that can generate and comprehend human language using vast amounts of text data) which are considered to pose significant risks in terms of misuse of personal data.
Within the Opinion, the EDPB provides responses to a number of queries put forward by the DPC, which are considered in this article. These can broadly be summarised as follows:
- How AI models can be considered ‘anonymous’: AI models must be evaluated on a case-by-case basis to ensure they cannot identify individuals directly or indirectly;
- How the legal basis of ‘legitimate interest’ can be utilised for developing and deploying AI models: There is no hierarchy between the legal bases provided by the GDPR. Such development and deployment of AI models can (in certain scenarios) be justified under the basis of legitimate interest, but usage of personal data must be strictly necessary and balanced against individuals' rights; and
- What is the impact on the lawfulness of using an AI model if it was developed with unlawfully processed personal data: AI models developed using unlawfully processed personal data must be scrutinised to ensure compliance with GDPR principles.
Anonymity of AI Models
Accordingly, a key aspect of the EDPB's Opinion is how AI models can be considered ’anonymous’ when they are trained by utilising personal data, and ultimately concludes that not all such AI models can be considered anonymous.
The Opinion emphasises that the anonymity of AI models should be evaluated on a case-by-case basis by Supervisory Authorities (SAs). For an AI model to be considered anonymous, it must be highly unlikely to:
- directly or indirectly identify individuals whose data was used to create the model; and
- extract personal data from the model through queries, taking into account ‘all the means reasonably likely to be used’ by the controller or another person.
The Opinion provides a non-exhaustive list of methods to demonstrate anonymity, which DPAs can use in their assessments.
In this assessment, the EDPB does not refer to the Discussion Paper from the Hamburg Commissioner for Data Protection and Freedom of Information, which maintains that Large Language Models (LLMs) do not inherently contain personal data, but states that the data used to train LLMs is transformed into abstract mathematical representations and probability weights. Consequently, the Hamburg DPA’s view is that data subject rights under GDPR do not apply to the models themselves but to the input and output data processed by the AI system.
It is also notable that the EDPB makes no reference to the Opinion on Anonymisation Techniques released in 2014, in which it is clarified that data is still considered personal if the original identifiable dataset is retained, even if identifiers are removed or masked. Only when data is aggregated to a level where individual events are no longer identifiable can it be considered anonymous. For example, individual travel patterns remain personal data if the raw data is accessible, but aggregated statistics such as general passenger trends, can be deemed anonymous.
Legitimate Interest as a Legal Basis
Legitimate interest serves as the primary legal basis for AI model developers, given the impracticality of obtaining consent from every individual whose data is included in the training dataset.
The Opinion outlines general considerations for DPAs when evaluating if legitimate interest is an appropriate legal basis for processing personal data in AI development and deployment. A three-step test is required when assessing the use of legitimate interest as a legal basis:
- identifying the legitimate interest pursued;
- analysing the necessity of the processing for the legitimate interest; and
- ensuring the legitimate interest is not overridden by the data subjects' interests or fundamental rights and freedoms.
The balancing test in the third step will inevitably be the most challenging for AI developers. If the impact on data subjects outweighs the AI developer's interests, appropriate measures must be implemented to mitigate this impact.
The European Data Protection Board (EDPB) suggests the following measures:
- Technical De-identification Measures: Implementing de-identification techniques, such as using synthetic datasets or pseudonymisation techniques to prevent the combination of data based on individual identifiers can help mitigate risks.
- Measures that Facilitate the Exercise of Individual Rights: Developers should enable data subjects to exercise their rights, such as the right to erasure. For example, developers should implement a reasonable period between data collection and use, allowing time for data subjects to exercise their rights. They should offer an unconditional opt-out option before processing begins, enhancing control over personal data. Additionally, developers should permit the right to erasure even when not explicitly required by GDPR and allow data subjects to submit claims regarding data regurgitation, enabling appropriate unlearning techniques.
- Transparency Requirements: Developers must inform data subjects about the processing activities. This can be particularly challenging when training data is scraped from public sources. Additionally, compliance with transparency requirements does not automatically ensure that the processing aligns with the reasonable expectations of the data subjects. Aspects that can determine if individuals can reasonably expect certain uses of their personal data include the public availability of the data, the relationship between the individual and the controller, the context and source of data collection, potential further uses of the model, and individuals' awareness of their data being online.
- Specific Mitigating Measures in Web Scraping: When scraping data from the web, developers should avoid including pages likely to contain sensitive information and exclude websites that explicitly prohibit scraping and reuse of their content for AI training purposes.
These measures aim to balance the interests of AI developers with the rights and expectations of data subjects, ensuring responsible and compliant AI development.
Consequences of Unlawful Processing
The EDPB has clarified that the use of AI models developed with unlawfully processed personal data can significantly impact their lawfulness. When an AI model is developed using unlawfully processed personal data, its deployment can in turn be deemed unlawful unless the model has been properly anonymised. The Opinion offers guidance for case-by-case analysis, considering the diversity and rapid evolution of AI models.
Importantly, the EDPB confirms that if a supervisory authority determines that an AI model was developed unlawfully under GDPR, it has the authority to order the deletion of the model (if deemed proportional) or to allow individuals to opt out of having their data used. This underscores the critical need for compliance with data protection regulations throughout the AI development process.
Takeaways
Ultimately, the Opinion emphasises that the development and use of AI models must be carefully balanced with the protection of personal data. EDPB Chair Anu Talus made the following statement in conjunction with the release of the Opinion:
“AI technologies offer numerous opportunities and benefits across various sectors. It is crucial to ensure these innovations are conducted ethically and safely, benefiting everyone. The EDPB aims to support responsible AI innovation by ensuring personal data protection in full compliance with the General Data Protection Regulation (GDPR).”
Key aspects to focus on within the Opinion are the importance of transparency, the implementation of robust technical measures, and the facilitation of individuals' rights to ensure compliance with GDPR. It remains to be seen how DPAs will interpret these guidelines in the enforcement phase, now that the Opinion has been made public; regardless, AI model developers should take careful note of the Opinion and implement an AI governance programme to ensure all AI models and systems are properly assessed before deployment.
If you have any questions or would otherwise like to discuss any issue raised in this article, please contact Tom Whittaker, Brian Wong, Lucy Pegler, Martin Cook, Liz Smith or any other member in our Technology team.
This article was written by Victoria McCarron