The Open Data Institute - a non-profit seeking to promote trust in data - has produced a taxonomy of the data involved in developing, using and monitoring foundation AI models and systems (here). The report is described as being “a response to the way that the data used to train models is often described as if a static, singular blob, and to demonstrate the many types of data needed to build, use and monitor AI systems safely and effectively.”
It covers terms such as:
- for developing AI sytems - existing data; training data; reference data; fine-tuning data; testing and validation data; benchmarks; synthetic data
- for deploying AI systems - model weights; local data; prompts; model outputs;
- for monitoring AI systems - data about models; data about model usage and performance; registers of models.
Whilst the taxonomy is focussed on foundation AI models and systems, the researchers suspect much of it will apply to smaller foundation models, too.
The taxonomy is a useful addition to a growing body of work seeking to improve discussion about AI systems, such as NIST's terminology of adversarial machine learning attacks and mitigations, as well as definitions (and their explanations) contained in proposed and enacted legislation (see our blog for our latest glossary on AI terms as used in proposed and enacted laws and regulations).
If you would like to discuss how current or future regulations impact what you do with AI, please contact Tom Whittaker, Brian Wong, Lucy Pegler, David Varney, or Martin Cook.
For the latest on AI law and regulation, see our blog and newsletter.