Abstract
The UK Biobank is the most used dataset for genome-wide association studies (GWAS). GWAS of sex, essentially sex differences in minor allele frequencies (sdMAF), has identified autosomal SNPs with significant sdMAF, including in the UK Biobank, but the X chromosome was excluded. Our recent report identified multiple regions on the X chromosome with significant sdMAF, using short-read sequencing of other datasets. We performed a whole genome sdMAF analysis, with ~410 k white British individuals from the UK Biobank, using array genotyped, imputed or exome sequencing data. We observed marked sdMAF on the X chromosome, particularly at the boundaries between the pseudo-autosomal regions (PAR) and the non-PAR (NPR), as well as throughout the NPR, consistent with our earlier report. A small fraction of autosomal SNPs also showed significant sdMAF. Using the centrally imputed data, which relied mostly on low-coverage whole genome sequence, resulted in 2.1% of NPR SNPs with significant sdMAF. The whole exome sequencing also displays sdMAF on the X chromosome, including some NPR SNPs with heterozygous genotype calls in males. Genotyping, sequencing and imputation of X chromosomal SNPs requires further attention to ensure the integrity for downstream association analysis.</p>