About
With the decreasing cost of DNA sequencing, large databases of human genomes are being collected in order to boost health related research, leading to the establishment of biobanks containing both genetic and phenotypic data for hundreds of thousands of individuals. To analyse this kind of datasets, there is an urgent need for accurate and scalable statistical methods able to process large amounts of data.
Unraveling the genetic basis of complex traits and diseases in humans requires a deep understanding of how and which genetic material is shared between individuals. Statistical models able to achieve this do not cope well with large amounts of data and remain a technical bottleneck in the field to fully exploit the available data. In this research project, we propose to develop a series of new statistical methods to model this sharing between individuals with the specific goals of estimating (i) how alleles segregate on chromosomes (i.e. haplotype estimation), (ii) which unobserved rare mutations are carried by individuals (genotype imputation) and (iii) which parent any allele is inherited from (parent-of-origin inference).
Having these three pieces of information in hand, we will then show how they can help to further characterize the genetic architecture of multiple UK Biobank complex traits, notably at the levels of rare variation effects and parental-of-origin effects (the effect of an allele differs depending on the parent it is coming from). We aim to show the prevalence of these two types of genetic effects in humans across multiple commonly studied complex traits in relation to medical history, sociodemography, lifestyle, anthropometry, cognition, medical history and blood/urine biomarkers.
This work is expected to propose a new set of tools able to leverage large amounts of genetic data, as it is typically the case in biobanks, and to deepen our understanding of the genetic basis of multiple complex traits. Any new statistical tools and genetic locus of interest will be made freely available to the scientific community in respect of the UK Biobank data privacy policy.