Abstract
Background: Efforts to elucidate subtypes within depression have yet to establish a consensus. In this study, we aimed to rigorously compare different subtyping approaches in the same participant space to quantitatively test agreement across subtyping approaches and determine whether the different approaches are sensitive to different sources of heterogeneity in depression.</p>
Methods: We implemented 6 different data-driven subtyping methods developed in previous work using the same UK Biobank participants (n = 2276 participants with depression, n = 1595 healthy control participants). The 6 approaches include 2 symptom-based, 2 structural neuroimaging-based, and 2 functional neuroimaging-based techniques. The resulting subtypes were compared based on participant assignment, stability, and sensitivity to subtype differences in demographics, general health, clinical characteristics, neuroimaging, trauma, cognition, genetics, and inflammation markers.</p>
Results: We found almost no agreement between the resulting subtypes of the 6 approaches (mean adjusted Rand index [ARI] = 0.006), even within data domains. This finding was largely driven by differences in input feature set (mean ARI = 0.005) rather than clustering algorithm (mean ARI = 0.23). However, each approach had relatively high internal stability across bootstraps (ARI = 0.36-0.89); most approaches performed above null; and most approaches were sensitive to relevant phenotypes within their data domain.</p>
Conclusions: Despite marginal overlap between approaches, we found the subtyping approaches to be internally consistent. These results explain why previous studies found strong evidence for subtypes within their analysis but with very little convergence between studies. We recommend that in future work, investigators incorporate systematic comparisons between their approach and alternative/previous approaches to facilitate consensus on depression subtypes.</p>