Could machine learning be used to assist evaluators in regulated public procurements?, Tom Whittaker

‘Machine learning has the potential to revolutionize a wide range of industries and applications’. So says ChatGPT – a chatbot based on the world’s largest artificial neural network – when asked the question ‘what is the potential for machine learning?’ Could one of those applications be assisting evaluators in regulated procurements? If so, what are some of the legal risks that will need to be addressed before a court may accept a Machine Learning (ML) (or other Artificial Intelligence, ‘AI’) system’s role in evaluation?

At this stage it appears unlikely any software will be evaluating alone. However, there is potentially an attractive role for ML and in particular Natural Language Processing (NLP) to assist human evaluators.

Public Procurement Requirements

Among other things, successful evaluation in public procurement needs to be transparent, as objective as possible and based upon subject matter expertise (i.e. a full and detailed understanding of the product/service to be provided). One key aspect of transparency is the keeping of clear records of how the decisions were reached and why. Judges (understandably) look to evaluators to take responsibility for their decisions and reasoning and to be able to justify and explain them.

ML, while having no fixed technical or legal definition[1], is (generally speaking) a means of using computer analysis to identify statistical patterns in large volumes of data without having been explicitly trained how to do so. Simplistically (with apologies to purist data scientists), the system is not told ‘how’ to identify a good or right result but trained on data (usually over repeated cycles) to identify patterns. That ML model can then be used on new, previously unseen, data sets to make a prediction (e.g. individual A has credit score B, document C is relevant to an issue) or generate new data with similar characteristics. This approach can be used on an astonishing array of data types to produce insightful, useful and, sometimes surprising, results. Facial recognition, identification of similar (and relevant) concepts in language (NLP), generation of new text (e.g. GPT-3, ChatGPT), or images (e.g. DALL-E 2) are increasingly impressive.

The question arises – could you train a model to review a bid and score it against an evaluation matrix? If not what, use could you put ML to, to assist evaluators?

Could you evaluate tenders using AI?

There are practical and legal hurdles to using ML to evaluate tenders. A system could need to be trained to recognise patterns, such as the ‘good’ or ‘right’ answers for each evaluation score (also known as supervised learning) and, aside from the philosophical question of whether there is a ‘right’ answer in this respect it is challenging to consider where a training set could come from for evaluation of a bespoke procurement requirement.

The nature of ML systems itself also throws up some legal challenges for application in procurement evaluation. Although there are ways of examining the system’s algorithm to understand how it reached its output (‘Explainable’ or ‘Interpretable’ AI), in essence that process can be hard to understand or explain to non-specialist lawyers. Judges may be wary of accepting apparently ‘black box’ analysis. Particularly where there will likely be an absence of traditional ‘records’ of evaluation.

The outcome also depends upon the model’s training. Consequently, while many will think that a computer cannot be biased, its training, model selection and use can be and the outcome can therefore build in biases from the training set/ process. Demonstrating objectivity therefore becomes difficult.

A ML system cannot take responsibility for, and explain, its reasoning like a human evaluator. It has no ‘expertise’ to apply beyond that already built into its training set or any wider appreciation of the needs of the buyer in the particular case (although this can in principle reduce the risk of it taking into account matters which were not in the tender documents).

Some of these issues may be surmountable as judges gain greater exposure to and comfort in the operation of ML assisted evaluation. For example, the early experience of computer assisted review (itself a specific machine learning tool) in disclosure was treated with caution but is now increasingly understood and welcomed. With familiarity more trust may arise. Nonetheless, at the moment, the above constraints suggest independent evaluation will be hard to square with current procurement law expectations of evaluation.

It may be a while before such independent evaluation therefore becomes realistic. A more imminent consideration will be what ML tools might be used to help human evaluators discharge their role.

**Scope to use ML tools to assist human evaluators**

Evaluators face a difficult task. They have (amongst other things) to identify relevant information from (in some cases) large amounts of tendered information, compare it to specifications and evaluation matrices, apply their expertise, potentially ask clarification questions and reach a consensus view of which description of a score or band a tender fulfils.

There is scope here for ML to assist. NLP in particular (which is trained on wider language databases so need not be bespoke trained for an individual procurement) could be used to identify where a bid document referred to certain concepts (including variations in language, synonyms etc) so an evaluator was less likely to miss relevant material. NLP could also be used to identify potential gaps in a bid – ‘the requirements included X but X is not apparent in the submission.’ Similarly, when examples are required, ML could be used to suggest whether those examples have missed any of the requirements of the specification such that the evaluator should consider the importance of such omission.

AI systems may be developed to automate repetitive and well-defined tasks typically undertaken by humans, for example, checking against external data sources whether a bidder holds required certification or meets financial criteria. In principle text generation tools could be used to draft a hypothetical set of evaluator’s reasons against which the evaluator can ‘sense-check’ their own reasons. In each of the above examples, the evaluator remains at the centre of the process using the ML tool to draw down or identify relevant information. Importantly that evaluator can then check their understanding against what is identified rather than ceding responsibility for evaluation to the ‘system.’

Once the evaluator has reached their conclusion, (although the risks of conducting such an exercise and disclosing the outcome should be considered) an ML tool could be used to check the evaluator’s reasons for potential errors as a quality assurance process – ‘your evaluator reasons say that the bid did not include Y but the submission appears to refer to Y’.

There is an exciting opportunity to identify and learn to use ML tools to support evaluators to justify they have done their jobs properly. Judges are likely to want to probe and be reassured about exactly how this exercise has taken place and tender documents may well do best to mention it will be used as part of the process when they are issued (to avoid objections later). However, provided there are clear reasons and responsibility for good quality analysis is still taken by the evaluator, in principle there seems no reason why an evaluator should not use the powerful tools available. It is no more than an extension of the technology we are already familiar with.

And more widely…

There are likely to be other (non-evaluation) procurement uses of ML and AI as they become increasingly ubiquitous technologies. Managing bidder contracts, design of objectives, preparation of specifications and design of scoring matrices (in principle, future evaluation criteria might also be designed to maximise the opportunity for automatic ML analysis) and comparison of the ‘success’ of procured contracts could all be potentially assisted by ML software as it develops. The reality is that the ‘tool’ is developing and like other technologies before it, will start to integrate, to a greater or lesser extent, in many aspects of day to day business. Public Procurement will be included in that.

Understanding how and when ML system can be used will grow. The UK has its AI strategy, including growing the talent pool needed for AI development. Investments in AI in business processes and support, both in terms of value and number, have been growing over the last ten years, with significant increases in 2021 and 2022.

The technology is developing quickly: GPT2, released in 2019 was trained on 1.5 billion parameters; GPT3, released in 2022, is trained on 175 billion parameters. The use cases are expanding as the technology improves and its potential is better understood. What that potential looks like in procurement – and whether the legal frameworks require updating – will undoubtedly change, too, but can be anticipated in advance.

To discuss the future of procurement and AI law, please contact Ian Tucker or Tom Whittaker.

[1] Note that AI is defined in the National Security and Investment Act 2021 (Notifiable Acquisition) (Specification of Qualifying Entities) Regulations 2021/1264, the EU AI Act and Canada’s Artificial Intelligence and Data Act.

‘Machine learning has the potential to revolutionize a wide range of industries and applications’. So says ChatGPT – a chatbot based on the world’s largest artificial neural network – when asked the question ‘what is the potential for machine learning?’

https://openai.com/api/