Abstract
For many problems in neuroimaging, the most informative features occur in the tail of the distribution. For example, when considering psychiatric disorders as deviations from a 'norm', the tails of the distribution are considerably more informative than the bulk of the distribution for understanding risk, stratifying and predicting such disorders, and for anomaly detection. Yet, most statistical methods used in neuroimaging focus on modeling the bulk and fail to adequately capture extreme values occurring in the tails. To address this, we propose a framework that combines normative models with multivariate extreme value statistics to accurately model extreme deviations of a reference cohort for individual participants. Normative models are now widely used in clinical neuroscience and similar to the employment of normative growth charts in pediatric medicine to track a child's weight in relation to their age; normative models can be used with neuroimaging measurements to quantify individual neurophenotypic deviations from a reference cohort. However, formal statistical treatment of how to model the extreme deviations from these models has been lacking until now. In this article, we provide such an approach inspired by applications of extreme value statistics in meteorology. Since the presentation of extreme value statistics is quite technical, we begin with a non-technical introduction to the fundamental principles of extreme value statistics to accurately map the tails of the normative distribution for biological markers, including mapping multivariate tail dependence across multiple markers. Next, we give a demonstration of this approach to the UK Biobank dataset and demonstrate how extreme values can be used to accurately estimate risk and detect atypicality. This framework provides a valuable tool for the statistical modeling of extreme deviations in neurobiological data, which could provide us with more accurate and effective diagnostic tools for neurological and psychiatric disorders.</p>