Federated and AI-driven Analytics
Federated and AI-driven Analytics
We are entering a phase of significant change in the analysis of data, moving from limited data analysis in a single TRE to on-demand secure analysis across multiple TREs. ‘Analytics’ ranges from a simple statistical function yielding summary values through to multi-step processing workflows and complex AI/Machine Learning systems yielding an induced model that covers the training data. Federated Analytics is the ability to perform all of these analyses without the data necessarily being in a single location.
The HDR UK Driver Programmes seek to analyse data across multiple, locally controlled TREs and show strong scientific demand for infrastructure that can deliver federated capabilities. Researchers need flexibility in the range of analytics they can apply to problems but, for security reasons, arbitrary computations cannot be run in TREs. By enabling queries over multiple TREs – either at the individual or aggregated levels – researchers can use the full range of data instead of imposed narrow data selections consequent from access and technical complexity rather than scientific need.
There are three factors that make federation vital for research and HDR UK:
(i) innovations in federated computational approaches developed by HDR UK and globally by partners are ready to be assembled;
(ii) the public are sensitive to national movement of data to large databases for an undefined purpose; and
(iii) the proliferation of TRE services that are more localised at a hospital or NHS Trust level.
These all make it not only desirable but necessary to deliver the services required for Federated Analytics and support their use by researchers.
The Federated and AI-driven Analytics workstream will support: Federated Cohort Discovery which allows a user to establish the counts and distribution of available data across the TREs; Federated Meta Analysis in which analyses are performed within each TRE and summary statistics returned to give overall insight; and Federated Learning which allows a single model to be created and adjusted across all available datasets hosted across different TREs. It will address these needs by delivering the underlying core services needed in the ‘TRE in a box’ to safely communicate outside the TRE, and for the new computational functionality of the Gateway to support programmatic access for and to federated analytic and discovery tools and platforms, including modern computational workflow platforms, and the application of advanced AI techniques. The workstream will work with HDR UK members, partners and Driver Programmes to pilot federated analytic algorithms and AI approaches using a test bed platform and then deploy them into the Gateway and provide mechanisms to share analytical workflows for the wider benefit of HDR UK and the wider community.
Project Team
Project Details
Start Date: 01/04/2023
End Date: 31/03/2028
Funder: Health Data Research UK
Funding amount: £3M