Abstract
Type 2 diabetes (T2D) is a heterogeneous illness caused by genetic and environmental factors. Previous genome-wide association studies (GWAS) have identified many genetic variants associated with T2D and found evidence of differing genetic profiles by age-at-onset. This study seeks to explore further the genetic and environmental drivers of T2D by analyzing subgroups on the basis of age-at-onset of diabetes and body mass index (BMI). In the UK Biobank, 36 494 T2D cases were stratified into three subgroups, and GWAS was performed for all T2D cases and for each subgroup relative to 421 021 controls. Altogether, 18 single nucleotide polymorphisms were significantly associated with T2D genome-wide in one or more subgroups and also showed evidence of heterogeneity between the subgroups (Cochrane's Q P < 0.01), with two SNPs remaining significant after multiple testing (in CDKN2B and CYTIP). Combined risk scores, on the basis of genetic profile, BMI and age, resulted in excellent diabetes prediction [area under the ROC curve (AUC) = 0.92]. A modest improvement in prediction (AUC = 0.93) was seen when the contribution of genetic and environmental factors was evaluated separately for each subgroup. Increasing sample sizes of genetic studies enables us to stratify disease cases into subgroups, which have sufficient power to highlight areas of genetic heterogeneity. Despite some evidence that optimizing combined risk scores by subgroup improves prediction, larger sample sizes are likely needed for prediction when using a stratification approach.</p>