Abstract
Early diagnosis of cancer relies on accurate assessment of cancer risk in patients presenting with symptoms, when screening is not appropriate. But recorded symptoms in cancer patients pre-diagnosis may vary between different sources of electronic health records (EHRs), either genuinely or due to differential completeness of symptom recording. To assess possible differences, we analysed primary care EHRs in the year pre-diagnosis of cancer in UK Biobank and Clinical Practice Research Datalink (CPRD) populations linked to cancer registry data. We developed harmonised phenotypes in Read v2 and CTV3 coding systems for 21 symptoms and eight blood tests relevant to cancer diagnosis. Among 22,601 CPRD and 11,594 UK Biobank cancer patients, 54% and 36%, respectively, had at least one consultation for possible cancer symptoms recorded in the year before their diagnosis. Adjusted comparisons between datasets were made using multivariable Poisson models, comparing rates of symptoms/tests in CPRD against expected rates if cancer site-age-sex-deprivation associations were the same as in UK Biobank. UK Biobank cancer patients compared with those in CPRD had lower rates of consultation for possible cancer symptoms [RR: 0.61 (0.59-0.63)], and lower rates for any primary care consultation [RR: 0.86 (95%CI 0.85-0.87)]. Differences were larger for 'non-alarm' symptoms [RR: 0.54 (0.52-0.56)], and smaller for 'alarm' symptoms [RR: 0.80 (0.76-0.84)] and blood tests [RR: 0.93 (0.90-0.95)]. In the CPRD cohort, approximately representative of the UK population, half of cancer patients had recorded symptoms in the year before diagnosis. The frequency of non-specific presenting symptoms recorded in the year pre-diagnosis of cancer was substantially lower among UK Biobank participants. The degree to which results based on highly selected biobank cohorts are generalisable needs to be examined in disease-specific contexts.</p>