Essential information > Understanding UK Biobank > Data providers and dates of data availability

Data providers and dates of data availability

This page gives details of the various data providers for linked death, hospital inpatient, cancer and primary care (GP) records. It also gives the period for which each type of data is available, giving the earliest date for which we have data and an attempt to determine a censoring date for each type of data and data provider as described in the next sections.

Censoring dates

The censoring date for each data provider is the date up to when UK Biobank estimate that data received from that provider is mostly complete. Our general rule for estimating an appropriate censoring date is as follows:

The censoring date is the last day of the month for which the number of records is greater than 90% of the mean of the number of records for the previous three months, except where the data for that month is known to be incomplete in which case the censoring date is the last day of the previous month.

The censoring dates are not applied by UK Biobank to the data made available to researchers which will always contain the latest data regardless of censoring dates, and may include incomplete data after the dates below. These dates are intended for guidance only. Once a researcher has received their data, they should censor outcomes based on their own research protocol.

The Data Portal vs Showcase fields

Hospital inpatient data and death data are available to researchers in two format: record-level fields on the Data Portal and Showcase fields, which in the case of inpatient data give summary-level information about the hospital episodes. In addition, there are Showcase fields such as the Algorithmically-defined outcomes (Category 42) and the First Occurrences (Category 1712) which combine information from multiple sources (such as self-report at Biobank assessment centres, inpatient, death, and GP records).

These derived Showcase fields are updated less frequently than the Data Portal tables, and so two censoring dates are given below. The "Data Portal censoring date" gives the date the tables on the Data Portal are complete to, and the "Showcase censoring date" gives the date to which derived Showcase fields use complete data from that source. In both cases "complete" here means as determined by the above rule.

The Data Portal will in general be more up to date than the Showcase fields available in a main dataset. Researchers requiring the most current data should use the record-level fields.

Data available for all research purposes

Death data

Death

Data Provider

International Classification of Diseases (ICD)

Period of data currently available

Showcase censoring date

Data Portal censoring date

ICD9

ICD10

England & Wales NHS Digital 2006 onwards April 2006 onwards 31 Jan 2018 * 31 August 2020 **
Scotland NHS Central Register, National Records of Scotland 2006 onwards April 2006 onwards 30 Nov 2016 * 31 August 2020 **

* Note that the Showcase fields in Category 100093 (Death register) are complete to the end of March 2020, but the data used for derived Showcase fields (such as the Algorithmically-defined outcomes) is only complete to the dates given above.

** Note that in both cases the data continues well into September 2020 but is not complete for that month.

Hospital inpatient data

Hospital Admissions (Inpatients)

Data Provider

International Classification of Diseases (ICD)

Classification of Interventions and Procedures (OPCS)

Period of data currently available

Showcase censoring date

Data Portal censoring date

ICD9

ICD10

OPCS3

OPCS4

Hospital Episode Statistics for England (HES) NHS Digital 1997 onwards 1997 onwards 1997 onwards, with critical care data from 2011 31 March 2017 * 30 June 2020 **
Scottish Morbidity Record (SMR) Information and Statistics Division (ISD), Scotland 1981 - 1996 1996 onwards 1977 - 1988 1989 onwards 1981 onwards 31 October 2016 *** 31 October 2016 ***
Patient Episode Database for Wales (PEDW) Secure Anonymised Information Linkage (SAIL), Wales 1999 onwards 1999 onwards 1998 onwards 29 February 2016 29 February 2016

* Note that the Showcase summary inpatient fields in Category 2001 through to Category 2005 are complete to the end of March 2020, but the English inpatient data used for other derived Showcase fields (such as the First Occurrences) is only complete to the date given above.

** Note that although the rule has given 30 June 2020 as the censoring date, it is possible that data for June is less complete than that for previous months. Note also that we have held back a very small proportion of inpatient data for April 2017 onwards (approximately 0.25%, or around 600 episodes per year) due to an incomplete linkage match. After they have been scrutinised further, some of these records may be released at a future date.

*** The Scottish hospital inpatient data does not currently include psychiatric or maternity admissions.

Primary care (GP) data - for all research

Note that the GP data available for all research covers a total of approximately 45% of the UK Biobank cohort. The coverage dates are based on the value of the field event_dt (event date) in the gp_clinical table.

GP dataset

Data provider

Participant coverage (approx.)

Coding systems

Period of data currently available

Censoring date

England TPP 165,000 See Resource 591 1938 onwards * 31 May 2016
Vision 18,000 See Resource 591 1940 onwards * 31 May 2017
Scotland Vision/EMIS 27,000 See Resource 591 1939 onwards * 31 March 2017
Wales Vision/EMIS 21,000 See Resource 591 1948 onwards * 31 Aug 2017

* For each provider the number of records per year is very low initially and gradually increases.

Cancer data

Cancer

Data Provider

International Classification of Diseases (ICD)

Period of data currently available

Censoring date

ICD9

ICD10

England & Wales NHS Digital 1979 - 1994 1995 onwards 1971 onwards 31 March 2016
Scotland National Records of Scotland, NHS Central Register 1980 - 1996 1997 onwards 1957 onwards 31 October 2015


Data available only for COVID-19 research

COVID-19 test results

For information on a censoring date for the COVID-19 test results data, please see the Data Portal updates table on the timelines page. The censoring date will generally be approximately one or two weeks before the "Last Update" date.

Primary care (GP) data - for COVID-19 research only

GP dataset

Data provider

Participant coverage (approx.)

Coding systems

Period of data currently available

Censoring date

England TPP 190,000 See Resource 3151 1938 onwards * 31 May 2020
EMIS 260,000 See Resource 3151 1938 onwards * 30 June 2020

* For both providers the number of records per year is very low initially and gradually increases.

Improving the health of future generations