Name: | Genotype imputation and genetic association studies using UK Biobank data |
Size: | 546,936 |
MD5: | b0dbdfc8bca91ae4b324b2f08fd120de |
This document describes the analysis carried out to perform genotype imputation for the interim release of the UK Biobank genotype data. It also provides advice about using the imputed data to carry out genome wide association studies (GWAS) or for extracting genotypes for use as covariates in other types of association study.
Genotype imputation is the process of predicting genotypes that are not directly assayed in a sample of individuals. A reference panel of haplotypes at a dense set of SNPs, indels and structural variants, is used to impute genotypes into a study sample of individuals that have been genotyped at a subset of the SNPs. These 'in-silico' genotypes can then be used to boost the number of SNPs that can be tested for association. This increases the power of the study, the ability to resolve or fine-map the causal variants and facilitates meta-analysis. The result of the imputation process is a dataset with 73,355,667 SNPs, short indels and large structural variants in 152,249 individuals. The process of imputation is divided into two steps
- pre-phasing;
- imputation.
This resource can be downloaded or viewed using the link: impute_ukb_v1.pdf
If you have wget available (typically on linux systems), then you can also obtain a copy using the command
wget -nd biobank.ndph.ox.ac.uk/ukb/ukb/docs/impute_ukb_v1.pdf