Abstract
BACKGROUND: Biobanks typically rely on volunteer-based sampling. This results in large samples (power) at the cost of representativeness (bias). The problem of volunteer bias is debated. Here, we (i) show that volunteering biases associations in UK Biobank (UKB) and (ii) estimate inverse probability (IP) weights that correct for volunteer bias in UKB.</p>
METHODS: Drawing on UK Census data, we constructed a subsample representative of UKB's target population, which consists of all individuals invited to participate. Based on demographic variables shared between the UK Census and UKB, we estimated IP weights (IPWs) for each UKB participant. We compared 21 weighted and unweighted bivariate associations between these demographic variables to assess volunteer bias.</p>
RESULTS: Volunteer bias in all associations, as naively estimated in UKB, was substantial-in some cases so severe that unweighted estimates had the opposite sign of the association in the target population. For example, older individuals in UKB reported being in better health, in contrast to evidence from the UK Census. Using IPWs in weighted regressions reduced 87% of volunteer bias on average. Volunteer-based sampling reduced the effective sample size of UKB substantially, to 32% of its original size.</p>
CONCLUSIONS: Estimates from large-scale biobanks may be misleading due to volunteer bias. We recommend IP weighting to correct for such bias. To aid in the construction of the next generation of biobanks, we provide suggestions on how to best ensure representativeness in a volunteer-based design. For UKB, IPWs have been made available.</p>