Abstract
Genetic prediction of male pattern baldness (MPB) is important in science and society. Previous genetic MPB prediction models were limited by sparse marker coverage, small sample size, and/or data dependency in the different analytical steps. Here, we present novel models for genetic prediction of MPB based on a large set of markers and large independent subsample sets drawn among 187,435 European subjects. We selected 117 SNP predictors within 85 distinct loci from a list of 270 previously MPB-associated SNPs in 55,573 males of the UK Biobank Study (UKBB). Based on these 117 SNPs with and without age as additional predictor, we trained, by use of different methods, prediction models in a non-overlapping subset of 104,694 UKBB males and tested them in a non-overlapping subset of 26,177 UKBB males. Estimates of prediction accuracy were similar between methods with AUC ranges of 0.725-0.728 for severe, 0.631-0.635 for moderate, 0.598-0.602 for slight, and 0.708-0.711 for no hair loss with age, and slightly lower without, while prediction of any versus no hair loss gave 0.690-0.711 with age and slightly lower without. External validation in an early-onset enriched MPB dataset from the Bonn Study (N = 991) showed improved prediction accuracy without considering age such as AUC of 0.830 for no vs. any hair loss. Because of the large number of markers and the large independent datasets used for the different analytical steps, the newly presented genetic prediction models are the most reliable ones currently available for MPB or any other human appearance trait.</p>