: Publication 8006

Publication 8006

Title:	A transfer learning approach based on random forest with application to breast cancer prediction in underrepresented populations.
Journal:	Biocomputing
Published:	1 Jan 2023
Pubmed:	https://pubmed.ncbi.nlm.nih.gov/36540976/
Citations:	4 (4 in last 2 years) as of 8 Aug 2024

WARNING: the interactive features of this website use CSS3, which your browser does not support. To use the full features of this website, please update your browser.

Abstract

Despite the high-quality, data-rich samples collected by recent large-scale biobanks, the underrepresentation of participants from minority and disadvantaged groups has limited the use of biobank data for developing disease risk prediction models that can be generalized to diverse populations, which may exacerbate existing health disparities. This study addresses this critical challenge by proposing a transfer learning framework based on random forest models (TransRF). TransRF can incorporate risk prediction models trained in a source population to improve the prediction performance in a target underrepresented population with limited sample size. TransRF is based on an ensemble of multiple transfer learning approaches, each covering a particular type of similarity between the source and the target populations, which is shown to be robust and applicable in a broad spectrum of scenarios. Using extensive simulation studies, we demonstrate the superior performance of TransRF compared with several benchmark approaches across different data generating mechanisms. We illustrate the feasibility of TransRF by applying it to build breast cancer risk assessment models for African-ancestry women and South Asian women, respectively, with UK biobank data.

7 Keywords

Black People
Breast Neoplasms
Computational Biology
Female
Humans
Machine Learning
Random Forest

3 Authors

Tian Gu
Yi Han
Rui Duan

Enabling scientific discoveries that improve human health