About
Our motivation for studying the UK Biobank data set is based on the observation that traditional detection methods for finding indicators of a particular disease state are not well translated with ethnic minorities. The intention is to develop algorithms that can learn genetic characteristics from large datasets, and then develop transfer learning methods for smaller datasets, ie. Understudied populations. We intend to deal with population specificity, with a particular focus on the population of the UAE.
For benchmark purposes, we aim to study diseases, namely coronary artery disease (CAD), atrial fibrillation, type 2 diabetes, inflammatory bowel disease, and breast cancer. The UK Biobank comprises a dataset from ethnically diverse participants, which is a crucial requirement for our purposes.
In particular, we will first perform population stratification on the dataset, clustering ethnic minorities. Our algorithms will then be trained on a very large dataset (representing the majority) while leaving out one or more ethnic minorities. We then apply Artificial Intelligence to the left-out minority. In a final step, we are interested in applying the established algorithm to local datasets.