Abstract
OBJECTIVES: Electronic health records (EHRs) provide substantial resources for observational studies, yet present significant challenges in safeguarding patient privacy while maintaining research quality. Differential privacy (DP) offers a quantifiable privacy guarantee; however, its impact on observational studies remains underexplored. We empirically evaluated the effects of DP across varying values of its privacy parameter, epsilon, on case-control analysis outcomes using EHR data. This study aims to inform DP parameter selection and examines the influence of study characteristics on differentially private observational studies.</p>
MATERIALS AND METHODS: We assessed the effects of DP on a case-control study of 1-year asthma exacerbations, including 22 165 participants with a history of asthma from UK Biobank linked to EHR data. Odds ratios (ORs) for sociodemographic factors and comorbidities were analyzed using adjusted and propensity score-matched models across epsilon values.</p>
RESULTS: DP influenced the magnitude, direction, and statistical significance of ORs, occasionally resembling patterns of misclassification, residual confounding, and false-positive bias. Rare and imbalanced covariates showed greater OR variability, especially in matched studies. Epsilons smaller than ln(2) led to noticeable OR fluctuations.</p>
DISCUSSION: The impact of DP on ORs and selection of an optimal epsilon depends on sample size, covariate prevalence, confounders, case-to-control ratios in propensity score matching, mitigation of random seed p-hacking, and trust models.</p>
CONCLUSION: The effects of DP on ORs are highly context-dependent. In this study, epsilon values below ln(2) led to unstable ORs across random seeds. Averaging results or using predetermined seeds may help reduce variability and mitigate p-hacking.</p>