Privacy-preserving Synthetic Twins for Medical Data With a Focus on Covid-19 Studies
Lead Institution:
Aalto University
Principal investigator:
Professor Samuel Kaski
WARNING: the interactive features of this website use CSS3, which your browser does not support. To use the full features of this website, please update your browser.
About
Machine learning allows rapid discovery of important features from the data which is important especially during times of crisis, such as the ongoing SARS-CoV-2 pandemic. In general these methods provide better utility when applied to large amounts of data. However the data is often spread across multiple parties and are subject to strict privacy regulations due to sensitive information contained within.
We aim to make widespread sharing of research data possible by developing techniques for releasing a synthetic twin instead of the original sensitive data. The synthetic twin shares the statistical properties of the original data, while preserving the anonymity of the individuals in the original data set. Our approach is based on differentially private learning, which prevents re-identification from the learning outcomes by limiting the extent of identifying characteristics learned from the data. We generate the synthetic data from a probabilistic model trained using such techniques.
Developing and refining our methods will take approximately one year. This project has potential for a significant boost in data accessibility which opens possibilities for a wide variety of new medical research.