STANdards for data Diversity, INclusivity and Generalisability

The Opportunity of Artificial Intelligence (AI) in Healthcare

The power of AI lies in its ability to learn patterns from large amounts of data, in a way that exceeds human abilities. However, this also means the reliability of AI algorithms is closely linked to the data it is trained upon, and may perform poorly when confronted by new data examples – a failure of ‘AI generalisability’. To be sure that algorithms work for everybody, we need to test them on datasets that represent the diverse range of people it is intended to be used in.

This project will develop standards that ensure datasets for training and testing AI systems are diverse, inclusive, and promote AI generalisability. Patients, public, health professionals, researchers, ethicists and policy-makers will work together to agree what the essential criteria for datasets should be: both who is represented (dataset composition) and how this information is provided (dataset reporting). We will develop new recommendations for AI datasets, to help gatekeepers (regulators, commissioners, policy-makers and health data institutions) assess whether datasets and the algorithms developed by them are suitable for the target population. This means we will have better datasets for development and testing of AI and, and in the long-term, better health outcomes for all, and in particular minoritised populations. By getting the data foundation right, STANDING Together ensures that ‘no-one is left behind’ as we seek to unlock the benefits of AI in healthcare.

The Problem
To build AI healthcare technologies which benefit all patients, we need datasets which represent the diverse range of people they are intended to be used in. Unfortunately, health datasets often do not adequately represent minoritised populations.

Our Mission
We believe health datasets should be curated with inclusivity and diversity in mind. We are developing standards to ensure AI healthcare technologies are supported by adequately representative data, relating to how AI datasets should be composed (‘who’ is represented in the data) and transparency around the data composition (‘how’ they are represented).

Project Team

Project Details

Start Date: 01/12/2021

End Date: 01/12/2023

Funder: The Health Foundation, and The NHS AI Lab

Predictive models in perioperative medicine

HDR Midlands Project

PATHWAY – UHB Health Data Research Hub

HDR Midlands Project

West Midlands Secure Data Environment

HDR Midlands Project