Abstract
BackgroundRare diseases collectively affect up to 10% of the population, but often lack effective treatment, and typically little is known about their pathophysiology. Major challenges include suboptimal phenotype mapping and limited statistical power. Population biobanks, such as the UK Biobank, recruit many individuals who can be affected by rare diseases; however, investigation into their utility for rare disease research remains limited. We hypothesized the UK Biobank can be used as a unique population assay for rare diseases in the general population.MethodsWe constructed a consensus mapping between ICD-10 codes and ORPHA codes for rare diseases, then identified individuals with each rare condition in the UK Biobank, and investigated their age at recruitment, sex bias, and comorbidity distributions. Using exome sequencing data from 167,246 individuals of European ancestry, we performed genetic association controlling for case/control imbalance (SAIGE) to identify potential rare pathogenic variants for each disease.ResultsUsing our mapping approach, we identified and characterized 420 rare diseases affecting 23,575 individuals in the UK Biobank. Significant genetic associations included JAK2 V617F for immune thrombocytopenic purpura (p = 1.24 × 10−13) and a novel CALR loss of function variant for essential thrombocythemia (p = 1.59 × 10−13). We constructed an interactive resource highlighting demographic information (http://www-personal.umich.edu/~mattpat/rareDiseases.html) and demonstrate transferability by applying our mapping to a medical claims database.ConclusionsEnhanced disease mapping and increased power from population biobanks can elucidate the demographics and genetic associations for rare diseases.</p>