Essential information > Accessing your data
Accessing UK Biobank data guide
The following document provides guidance on how to download the various different type of UK Biobank data:
Data access guide
Please see the Understanding UK Biobank page for information about the data available, which also includes further information and reports about linkage data (Cancer, Hospital Inpatient, Death).
See also the timelines page for information about recent updates.
Data Dictionaries and EncodingsData Dictionary of Showcase fields: List of Data-Codings (including those for the Data Portal):
UKB Synthetic Dataset
A synthetic version of the UK Biobank dataset has been created to allow large scale system testing using data which is comparable in size and constitution to the real dataset.
Further details regarding the methods used to create this resource and links to file downloads can be found on that page.
Size of the Core Dataset
The core dataset consists of the categories shown in the Quick Start section of Showcase when a basket is first created:
As an illustration of the potential size of a downloaded UK Biobank main dataset, the sizes of the various files generated from the core dataset are given on the table below:
|File type||ukbconv option||Dataset size||File extension|
In addition: the R .tab file by a 511 KB .r script, the SAS .sd2 file is accompanied by a 1.3 MB .sas script, and the Stata .raw file by a 501 KB .do script and a 961 KB .dct file.
Note that the large difference in size between the tsv .txt file and the (also tab-separated) R .tab file is due to empty fields being represented by the empty string in the former and by NA in the latter. Similarly, all fields are quoted in the .csv file, with empty fields appearing as "", which accounts for its additional size compared to the .txt file.
Information about the sizes of bulk data items such as MRI images can be found in section 8.4 of the "Accessing data guide" above. This document also includes links to documents providing information about the size of the Genotype data (Section 4.1) and the Exome data (Section 4.3).
* Please note that there is currently a glitch with the tsv version of the converted file, in that every row except for the first starts with an extra tab (thereby throwing off the column alignment). If you intend to use this option you will need to have the technical know-how to manipulate the file to correct the problem. Another approach is simply to use the R option instead as that also produces a tab-separated file, and to disregard the accompanying R script. (Note that empty fields will appear as NA rather than the empty string in the resulting file however.) We are currently looking into correcting this problem.