Accessing Returned Data within UK Biobank

When an Application completes the reseachers involved are obliged to send a summary of their results and any derived measures back to UK Biobank for distribution to other projects. The ukblink client has been developed to allow Approved researchers to download these Returned Datasets and to link their individually pseudonymised contents to the researchers own current Application (known here as 'bridging'). This guide explains how to use the ukblink utility, which can be obtained from the download section of the UK showcase website.

To acquire Returned Datasets from the UK Biobank secure online repository services a researcher must

  1. be a validated UK Biobank researcher;
  2. be part of an Approved Application;
  3. have been issued a standard dataset together with the associated password credentials;
  4. have included the desired Returned Datasets in an approved Basket.

This webpage details the means by which Returned Data held by UK Biobank can be accessed and manipulated once this access has been approved.

  1. Preparation
  2. Notices
  3. Authentication
  4. Fetching data

1. Preparation

Following approval of a research application, researchers will be sent a 32-character MD5 Checksum and a 64-character password. The next step is to acquire the ukblink utility from the Downloads section of the Showcase website.

Some of the UKB utilities are supplied pre-compiled for both MS-Windows and Linux systems. The MS-Windows utilities have the suffix .exe however the explanations given in this guide omit this for generality. All the utility programs are command-line, so Windows versions are best run from a Command Prompt window, and Linux versions are best run directly from a Terminal.

The repository consists of a pair of mirrored systems each connected to the UK JANET network by independent links. The system names are:

To access bulk data from a remote computer the system that the download utility is running on must be able to make http (Port 80) connections to at least one, and preferably both, of the repository systems. If this is not possible then researchers should contact their local IT team to resolve the issue.

2. Notices

Before downloading any data, Researchers are reminded that: It is important to be aware that participants occasionally withdraw from UK Biobank so there may be elements within a Returned Dataset that relate to people who are no longer part of the study. The mapping file will not contain identifiers for such people so any obsolete individual information relating to them is nullified.
Note also that while it is possible to run multiple downloads in parallel, to provide fair usage the system will not permit a single Application to run more than 10 simultaneously.

3. Authentication

To access the repository it is necessary to prove ones identity to the system using a keyfile. See Resource 667 for detailed information on this.

4. Fetching Data

Returned datasets are identified by ID numbers (given on the UKB Showcase) and originate with specific Applications. In order to make use of such information a researcher must download both the returned-dataset itself and bridging-file linking the participant-IDs in the returned data to those of the current Application.

4.1 Using ukblink

The ukblink utility can be used with various flags to retrieve either returned datasets or bridging files. single or multiple Bulk data-files. ukblink command is:
  ukblink  [-bapp_id]  [-rdataset_id]  [-aauthentication_keyfile] [-v] [-h]
where the flags are as follows:

-aSpecifies the authentication keyfile containing application ID and truncated password. This is an optional flag and is not required if the default authentication file name (.ukbkey) has been used.
-bSpecifies the Application ID corresponding to the Returned dataset to which a bridging file is to be created. This Application ID can be found on the relevant Returns page on the Showcase. Note than an error will be generated if the Returned dataset does not contain individual level information.
-rSpecifies the Returned-dataset ID to fetch.
-hShows a basic help message.
-vSpecifies that output should be verbose (useful for tracing errors).

Either -b and -r must be present but not both of them.

As an example, suppose the authentication keyfile .ukbkey exists, then to retrieve Returned Dataset 1234 enter the following:

  ukblink -r1234
which will create the file ukbreturn1234.typ on the local disk where typ is an extension appropriate to the type of file. On failure the program will output an error message.

As a further example, suppose the authentication keyfile is called mykey.ukb and the user wishes to link identifiers between the current Application (as identified in the keyfile, say 567) and Application 890, then they would enter the following:

 ukblink -amykey.ukb -b890
which will create the file ukb567bridge890.txt. This file contains 2 columnm, with each row listing the identifier for a particular participant using the schemes for Apps 567 and 890.