Abstract
OBJECTIVES: We aimed to compare the performance of approaches for classifying insulin-treated diabetes within research datasets without measured classification biomarkers, evaluated against two independent biological definitions of diabetes type.</p>
STUDY DESIGN AND SETTING: We compared accuracy of ten reported approaches for classifying insulin-treated diabetes into type 1 (T1D) and type 2 (T2D) diabetes in two cohorts: UK Biobank (UKBB) n = 26,399 and Diabetes Alliance for Research in England (DARE) n = 1,296. The overall performance for classifying T1D and T2D was assessed using: a T1D genetic risk score and genetic stratification method (UKBB); C-peptide measured at >3 years diabetes duration (DARE).</p>
RESULTS: Approaches' accuracy ranged from 71% to 88% (UKBB) and 68% to 88% (DARE). When classifying all participants, combining early insulin requirement with a T1D probability model (incorporating diagnosis age and body image issue [BMI]), and interview-reported diabetes type (UKBB available in only 15%) consistently achieved high accuracy (UKBB 87% and 87% and DARE 85% and 88%, respectively). For identifying T1D with minimal misclassification, models with high thresholds or young diagnosis age (<20 years) had highest performance. Findings were incorporated into an online tool identifying optimum approaches based on variable availability.</p>
CONCLUSION: Models combining continuous features with early insulin requirement are the most accurate methods for classifying insulin-treated diabetes in research datasets without measured classification biomarkers.</p>