Essential information > Understanding UK Biobank > Data providers and dates of data availability

Data providers and dates of data availability

This page gives details of the various data providers for linked death, hospital inpatient, cancer and primary care (GP) records, and COVID-19 test results. It also gives the period for which each type of data is available, giving the earliest date for which we have data and an attempt to determine a censoring date for each type of data and data provider as described in the next sections.

Censoring dates

The censoring date for each data provider is the date up to when UK Biobank estimate that data received from that provider is mostly complete. Our general rule for estimating an appropriate censoring date is as follows:

The censoring date is the last day of the month for which the number of records is greater than 90% of the mean of the number of records for the previous three months, except where the data for that month is known to be incomplete in which case the censoring date is the last day of the previous month.

The censoring dates are not applied by UK Biobank to the data made available to researchers which will always contain the latest data regardless of censoring dates, and may include incomplete data after the dates below. These dates are intended for guidance only. Once a researcher has received their data, they should censor outcomes based on their own research protocol.

The Data Portal vs Showcase fields

Hospital inpatient data and death data are available to researchers in two formats: record-level fields on the Data Portal and Showcase fields, which in the case of inpatient data give summary-level information about the hospital episodes. In addition, there are Showcase fields such as the Algorithmically-defined outcomes (Category 42) and the First Occurrences (Category 1712) which combine information from multiple sources (such as self-report at Biobank assessment centres, inpatient, death, and GP records).

Data currently available

Death data

Death

Data Provider

International Classification of Diseases (ICD)

Period of data currently available

Censoring date

ICD9

ICD10

England & Wales NHS England 2006 onwards April 2006 onwards 30 November 2022 *
Scotland NHS Central Register, National Records of Scotland 2006 onwards April 2006 onwards 30 November 2022 *

* The censoring dates above reflect the data published on Research Analysis Platform (RAP); the summary statistics for death data fields on Showcase may reflect further updates that have not yet been published.

Hospital inpatient data

Hospital Admissions (Inpatients)

Data Provider

International Classification of Diseases (ICD)

Classification of Interventions and Procedures (OPCS)

Period of data currently available

Censoring date

ICD9

ICD10

OPCS3

OPCS4

Hospital Episode Statistics for England (HES) NHS England 1997 onwards 1997 onwards 1997 onwards, with critical care data from 2011 31 October 2022
Scottish Morbidity Record (SMR) Information and Statistics Division (ISD), Scotland 1981 - 1996 1996 onwards 1977 - 1988 1989 onwards 1981 onwards 31 August 2022 *
Patient Episode Database for Wales (PEDW) Secure Anonymised Information Linkage (SAIL), Wales 1999 onwards 1999 onwards 1991 onwards 31 May 2022

*The Scottish hospital inpatient data does not currently include maternity admissions. The Showcase summary fields contain Scottish data for the period August to September 2021 (as well as a very small number of other isolated records) that do not yet appear on the Data Portal and the RAP; these records will be updated on the Data Portal and RAP at the next Showcase release.

Notes on the English & Scottish inpatient data:


Primary care (GP) data - for all research

Note that the GP data available for all research covers a total of approximately 45% of the UK Biobank cohort. The coverage dates are based on the value of the field event_dt (event date) in the gp_clinical table.

GP dataset

Data provider

Participant coverage (approx.)

Coding systems

Period of data currently available

Censoring date

England TPP 165,000 See Resource 591 1938 onwards * 31 May 2016
Vision 18,000 See Resource 591 1940 onwards * 31 May 2017
Scotland Vision/EMIS 27,000 See Resource 591 1939 onwards * 31 March 2017
Wales Vision/EMIS 21,000 See Resource 591 1948 onwards * 31 Aug 2017

* For each provider the number of records per year is very low initially and gradually increases.

Primary care (GP) data - for COVID-19 research

GP data for COVID-19 research were made available to approved researchers between 2020 and 2021, which covered the majority of the UKB cohort (~450,000 participants) and the period 1938 to 31st August 2021 (see Resource 3151). These data are no longer available since the withdrawal of the UK Government COPI (Control of Patient Information) notice on 1st July 2022.

Cancer data

Cancer

Data Provider

International Classification of Diseases (ICD)

Period of data currently available

Censoring date

ICD9

ICD10

England NHS England 1979 - 1994 1995 onwards 1971 onwards 31 December 2020
Wales NHS England 1979 - 1994 1995 onwards 1971 onwards 31 December 2016*
Scotland National Records of Scotland, NHS Central Register 1980 - 1996 1997 onwards 1957 onwards 30 November 2021

* Welsh cancer registry data was originally provided by NHS England; however in mid-2023 it was discovered that Welsh data had ceased to be included in NHS England cancer registry extracts from 2017 onwards. The censoring date was therefore revised back to January 2017 for Welsh cancer records, and we are currently investigating alternative sources for this data.

COVID-19 test results

COVID-19 test

Data provider

Coding systems

Period of data currently available

England Public Health England See data dictionary Early 2020 - September 2022 *
Scotland Public Health Scotland See data dictionary Early 2020 - November 2022 *
Wales SAIL See data dictionary Early 2020 - December 2022 *

* Given changes in testing levels, the standard definition for censoring dates is no longer applicable for COVID-19 testing data. The dates given refer to the full range of data available, and should not be used to infer completeness.

COVID-19 vaccination data

COVID-19 vaccinations

Data provider

Coding systems

Period of data currently available

England NHS England See Resource 2910 May 2020 - June 2023 *

* The sporadic nature of vaccination appointments leads to date clustering at specific times of year. This means that the standard definition for censoring dates is not applicable for COVID-19 vaccination data. The dates given refer to the full range of data available, and should not be used to infer completeness.

Enabling scientific discoveries that improve human health