: Publication 13193

Publication 13193

Title:	Synthetic data for privacy-preserving clinical risk prediction
Journal:	Scientific Reports
Published:	27 Oct 2024
Pubmed:	https://pubmed.ncbi.nlm.nih.gov/39463411/
DOI:	https://doi.org/10.1038/s41598-024-72894-y
URL:	https://www.nature.com/articles/s41598-024-72894-y.pdf

WARNING: the interactive features of this website use CSS3, which your browser does not support. To use the full features of this website, please update your browser.

Abstract

Synthetic data promise privacy-preserving data sharing for healthcare research and development. Compared with other privacy-enhancing approaches - such as federated learning - analyses performed on synthetic data can be applied downstream without modification, such that synthetic data can act in place of real data for a wide range of use cases. However, the role that synthetic data might play in all aspects of clinical model development remains unknown. In this work, we used state-of-the-art generators explicitly designed for privacy preservation to create a synthetic version of ever-smokers in the UK Biobank before building prognostic models for lung cancer under several data release assumptions. We demonstrate that synthetic data can be effectively used throughout the medical prognostic modeling pipeline even without eventual access to the real data. Furthermore, we show the implications of different data release approaches on how synthetic biobank data could be deployed within the healthcare system.</p>

9 Keywords

Biological Specimen Banks
Female
Humans
Information Dissemination
Lung Neoplasms
Male
Privacy
Prognosis
United Kingdom

6 Authors

Zhaozhi Qian
Thomas Callender
Bogdan Cebere
Sam M. Janes
Neal Navani
Mihaela van der Schaar

1 Application

Application ID	Title
77097	Synthetic data for lung cancer prognostic modelling

Enabling scientific discoveries that improve human health