Generative AI – AI which itself generates content such as text, video and audio that resembles human-made content – is a hot topic:
- the UK recently published its White Paper on proposals to regulate AI;
- there is significant investor interest defying the broader technology market;
- hundreds of foundation models used for generative AI are available. Foundation models are a type of AI model that is trained on a vast quantity of data and is adaptable for use on a wide range of tasks. Generative AI typically uses large language models (LLMs) which are a type of foundation model. LLMs identify patterns in human language data (e.g. news articles and blog posts) in order to generate content; the larger the dataset for training, often the more nuanced the patterns identified and subsequently content generated; and
- the technology is evolving quickly, for example, GPT-4’s Technical Report shows how GPT has improved between recent versions (3.5 and 4) across a range of a diverse set of benchmarks, including exams like the US Bar Exam for law school entry.
However, such AI poses risks both in terms of the technology and its uses. What are the key data protection and cyber security risks that come with this exciting new application of AI technology?
Engage with data protection obligations from the outset and keep them under review
The ICO has written that those looking to develop or use generative AI should ask the following questions:
- What is your lawful basis for processing personal data? If you are processing personal data you must identify an appropriate lawful basis, such as consent or legitimate interests.
- Are you a controller, joint controller or a processor? If you are developing generative AI using personal data, you have obligations as the data controller. If you are using or adapting models developed by others, you may be a controller, joint controller or a processor.
- Have you prepared a Data Protection Impact Assessment (DPIA)? You must assess and mitigate any data protection risks via the DPIA process before you start processing personal data. Your DPIA should be kept up to date as the processing and its impacts evolve.
- How will you ensure transparency? You must make information about the processing publicly accessible unless an exemption applies. If it does not take disproportionate effort, you must communicate this information directly to the individuals to whom the data relates.
- How will you mitigate security risks? In addition to personal data leakage risks, you should consider and mitigate risks of model inversion and membership inference, data poisoning and other forms of adversarial attacks.
- How will you limit unnecessary processing? You must collect only the data that is adequate to fulfil your stated purpose. The data should be relevant and limited to what is necessary.
- How will you comply with individual rights requests? You must be able to respond to people’s requests for access, rectification, erasure or other information rights.
- Will you use generative AI to make solely automated decisions? If so – and these have legal or similarly significant effects (e.g. major healthcare diagnoses) – individuals have further rights under Article 22 of UK GDPR.
If you have a data protection query or need specialist advice relating to data protection, technology, or intellectual property please contact David Varney or speak to our Technology Team.
This article was written by Tom Whittaker and Alice Willoughby.
LLMs (such as ChatGPT) and their use cases – from writing essays to powering chatbots or creating websites without human coding involved – have captured the world’s imagination. But it is important to take a step back and reflect on how personal data is being used by a technology that has made its own CEO “a bit scared". Stephen Almond, Executive Director, Regulatory Risk, leads the ICO’s team responsible for anticipating, understanding and shaping the impacts of emerging technology and innovation on people and society.