About
The proposed project aims to investigate the relationship between functional convergence and genetic interactions among genetic variants contributed to the pathophysiology of complex diseases and their comorbidities. Genome-wide Association Studies (GWAS) have identified thousands of disease-associated Single Nucleotide Polymorphisms (SNPs); however, it remains elusive how multiple variants interact with each other.  Our pilot study suggests that disease-associated SNPs tend to have convergent functions. This study will integrate other data resources (e.g. ENCODE) to identify more functionally convergent SNPs. More importantly, we will use the UK Biobank to investigate systematically the genetic interactions of these SNPs. The proposed project aims to find the missing heritability via investigation of the genetic interactions that are hypothesized to associate with the convergent downstream effectors of genetic variations. It well meet the purpose of UK Biobank in unveiling the interactions among disease genes, lifestyles, and environmental factors. It will enrich the applications of UK Biobank by incorporating other mainstream omics projects. Further, this project may foster many actionable applications such as drug repositioning, personalized medicine, and novel intervention approaches.  We propose to employ major omics datasets to identify convergent biological mechanisms among disease-associated SNPs, including but not limited to results generated by GTEx, ENCODE, and Roadmap Epigenomics projects. Then, we will employ genotypes in UK Biobank to investigate the genetic interactions of these prioritized pairs of SNPs of convergent mechanisms and study the relationship between convergent biological mechanisms and genetic interactions. We will also prioritize new disease-associated SNPs by their comprised effects of functional convergence (through external omics datasets) and genetic interactions (through UK Biobank) with known disease-associated SNP (from NHGRI GWAS catalog). Our study uses a data-driven approach; thus, the diseases with prioritized convergent mechanisms of SNPs are difficult to foresee.  Therefore, we would like to use the whole cohort to validate the genetic interactions among the disease-associated SNPs that we prioritize. However, we only need a subset of data fields, mainly the genotype, diagnosis, lifestyle, and medication data, along with baseline population characteristics and demographics the cohort.