Essential information > Understanding UK Biobank > Data providers and dates of data availability
Data providers and dates of data availability
This page gives details of the various data providers for linked death, hospital inpatient, cancer and primary care (GP) records. It also gives the period for which each type of data is available, giving the earliest date for which we have data and an attempt to determine a censoring date for each type of data and data provider as described in the next sections.
Censoring dates
The censoring date for each data provider is the date up to when UK Biobank estimate that data received from that provider is mostly complete. Our general rule for estimating an appropriate censoring date is as follows:
The censoring date is the last day of the month for which the number of records is greater than 90% of the mean of the number of records for the previous three months,
except where the data for that month is known to be incomplete in which case the censoring date is the last day of the previous month.
The censoring dates are not applied by UK Biobank to the data made available to researchers which will always contain the latest data regardless of censoring dates, and may include incomplete data after the dates below. These dates are intended for guidance only. Once a researcher has received their data, they should censor outcomes based on their own research protocol.
The Data Portal vs Showcase fields
Hospital inpatient data and death data are available to researchers in two format: record-level fields on the Data Portal and Showcase fields, which in the case of inpatient data give summary-level information about the hospital episodes. In addition, there are Showcase fields such as the Algorithmically-defined outcomes (Category 42) and the First Occurrences (Category 1712) which combine information from multiple sources (such as self-report at Biobank assessment centres, inpatient, death, and GP records).
These derived Showcase fields are updated less frequently than the Data Portal tables, and so two censoring dates are given below. The "Data Portal censoring date" gives the date the tables on the Data Portal are complete to, and the "Showcase censoring date" gives the date to which derived Showcase fields use complete data from that source. In both cases "complete" here means as determined by the above rule.
The Data Portal will in general be more up to date than the Showcase fields available in a main dataset. Researchers requiring the most current data should use the record-level fields.
Data available for all research purposes
Death data
Death |
Data Provider |
International Classification of Diseases (ICD) |
Period of data currently available |
Showcase censoring date |
Data Portal censoring date |
|
ICD9 |
ICD10 |
|||||
England & Wales | NHS Digital | 2006 onwards | April 2006 onwards | 31 December 2020 * | 28 February 2021 | |
Scotland | NHS Central Register, National Records of Scotland | 2006 onwards | April 2006 onwards | 31 December 2020 * | 28 February 2021 |
* Only the First Occurrence fields, and not the Showcase death fields or the Algorithmically-defined outcome fields (Category 42).
Hospital inpatient data
Hospital Admissions (Inpatients) |
Data Provider |
International Classification of Diseases (ICD) |
Classification of Interventions and Procedures (OPCS) |
Period of data currently available |
Showcase censoring date |
Data Portal censoring date |
||
ICD9 |
ICD10 |
OPCS3 |
OPCS4 |
|||||
Hospital Episode Statistics for England (HES) | NHS Digital | 1997 onwards | 1997 onwards | 1997 onwards, with critical care data from 2011 | 31 December 2020 * | 31 December 2020 ** | ||
Scottish Morbidity Record (SMR) | Information and Statistics Division (ISD), Scotland | 1981 - 1996 | 1996 onwards | 1977 - 1988 | 1989 onwards | 1981 onwards | 31 December 2020 * | 31 December 2020 ** |
Patient Episode Database for Wales (PEDW) | Secure Anonymised Information Linkage (SAIL), Wales | 1999 onwards | 1999 onwards | 1998 onwards | 28 February 2018 * | 28 February 2018 | ||
* Not including the Algorithmically-defined outcome fields (Category 42).
** Note that although these are the censoring dates according the rule above, it is possible that data for the final month is less complete than that for previous months.
We have held back a very small proportion of English inpatient data for April 2017 onwards (approximately 0.25%, or around 600 episodes per year) due to an incomplete linkage match. After they have been scrutinised further, some of these records may be released at a future date.
The Scottish hospital inpatient data does not currently include psychiatric or maternity admissions.
Primary care (GP) data - for all research
Note that the GP data available for all research covers a total of approximately 45% of the UK Biobank cohort. The coverage dates are based on the value of the field event_dt (event date) in the gp_clinical table.
GP dataset |
Data provider |
Participant coverage (approx.) |
Coding systems |
Period of data currently available |
Censoring date |
England | TPP | 165,000 | See Resource 591 | 1938 onwards * | 31 May 2016 |
Vision | 18,000 | See Resource 591 | 1940 onwards * | 31 May 2017 | |
Scotland | Vision/EMIS | 27,000 | See Resource 591 | 1939 onwards * | 31 March 2017 |
Wales | Vision/EMIS | 21,000 | See Resource 591 | 1948 onwards * | 31 Aug 2017 |
* For each provider the number of records per year is very low initially and gradually increases.
Cancer data
Cancer |
Data Provider |
International Classification of Diseases (ICD) |
Period of data currently available |
Censoring date |
|
ICD9 |
ICD10 |
||||
England & Wales | NHS Digital | 1979 - 1994 | 1995 onwards | 1971 onwards | 31 March 2016 |
Scotland | National Records of Scotland, NHS Central Register | 1980 - 1996 | 1997 onwards | 1957 onwards | 31 October 2015 |
Data available only for COVID-19 research
COVID-19 test results
For information on a censoring date for the COVID-19 test results data, please see the Data Portal updates table on the timelines page. The censoring date will generally be approximately one or two weeks before the "Last Update" date.Primary care (GP) data - for COVID-19 research only
GP dataset |
Data provider |
Participant coverage (approx.) |
Coding systems |
Period of data currently available |
Censoring date |
England | TPP | 190,000 | See Resource 3151 | 1938 onwards * | 31 May 2020 |
EMIS | 260,000 | See Resource 3151 | 1938 onwards * | 30 November 2020 |
* For both providers the number of records per year is very low initially and gradually increases.