Category 180
Genomics ⏵ Whole genome sequences

Description

Whole genome sequencing data for 500,000 participants was released in late 2023, with a previous public release of the first 200,000 participants made available in late 2021.

500,000 participants were sequenced between 2018-2021 using Illumina NovaSeq technology. Sequencing was completed in two phases:

The Vanguard phase of 50,000 samples, with sequencing performed at Wellcome Sanger Institute and bioinformatics provided by Seven Bridges
The Main Phase of 450,000 samples, with sequencing divided between Wellcome Sanger Institute and deCODE Genetics.
Bioinformatics was provided by Seven Bridges for samples sequenced at Wellcome Sanger Institute, and provided in-house for those sequenced at deCODE Genetics.

The genomes were originally processed and joint called using BWA-mem and GATK, with this version of the dataset provided in Category 270. Subsequently, the data were reprocessed using DRAGEN 3.7.8 with individual and joint-called data available in Category 185.

The whole genome sequencing data were quality controlled; direct output files from some quality confirmation steps are available for each pipeline version in the respective categories, and specific metrics in tabular format are made available in Category 187.

Data from previous releases is available in Category 186.