Abstract
BackgroundMeasurement of multimorbidity, the co-occurrence of two or more conditions in the same individual, is highly variable which limits the consistency and reproducibility of research.MethodsUsing data from 172,563 UK Biobank (UKB) participants and a cross-sectional approach, we examined how choice of data source affected estimated prevalence of 80 individual long-term conditions (LTCs) and multimorbidity. We developed code-list-based algorithms to determine the prevalence of 80 LTCs in (1) primary care records, (2) UKB baseline assessment, (3) hospital/cancer registry records, and (4) all three data sources together.ResultsUsing records from all three data sources, 146,811 (85.1%) participants have at least one and 109,609 (63.5%) have at least two LTCs at baseline. A median of 4.7% (IQR 1.0-16.6) of participants with a condition are identified by all three data sources. Agreement is highest for endocrine, nutritional and metabolic disorders, with a median of 32.9% (IQR 20.5-34.1) of individuals with a condition identified by all three data sources. Agreement is lowest for diseases of the genitourinary system and mental and behavioural disorders where perfect agreement varies from zero to 4.9% and zero to 12.3% across conditions, respectively. The low agreement between data sources is accompanied by high proportions of individuals with a condition identified only in primary care data (i.e. not in either of the other two sources), with a median of 59.3% (IQR 47.4-75.9) for diseases of the genitourinary system and 66.9% (IQR 42.8-79.2) for mental and behavioural disorders.ConclusionsOur study highlights the impact of the choice of which data source is used in research on individual LTCs and multimorbidity, and the importance of clearly justifying choices made.</p>