Abstract
Selection bias in genome-wide association studies (GWASs) due to volunteer-based sampling (volunteer bias) is poorly understood. The UK Biobank (UKB), one of the largest and most widely used cohorts, is highly selected. Using inverse probability (IP) weights we estimate inverse probability weighted GWAS (WGWAS) to correct GWAS summary statistics in the UKB for volunteer bias. Our IP weights were estimated using UK Census data - the largest source of population-representative data - made representative of the UKB's sampling population. These weights have a substantial SNP-based heritability of 4.8% (s.e. 0.8%), suggesting they capture volunteer bias in GWAS. Across ten phenotypes, WGWAS yields larger SNP effect sizes, larger heritability estimates, and altered gene-set tissue expression, despite decreasing the effective sample size by 62% on average, compared to GWAS. The impact of volunteer bias on GWAS results varies by phenotype. Traits related to disease, health behaviors, and socioeconomic status are most affected. We recommend that GWAS consortia provide population weights for their data sets, or use population-representative samples.</p>