Genomics ⏵ Whole genome sequences
DescriptionWhole genome sequencing data is publicly available for 200,000 participants, with release of data for the full cohort anticipated in late 2023.
500,000 participants were sequenced between 2018-2021 using Illumina NovaSeq technology. Sequencing was completed in two phases:
- The Vanguard phase of 50,000 samples, with sequencing performed at Wellcome Sanger Institute and bioinformatics provided by Seven Bridges
- The Main Phase of 450,000 samples, with sequencing divided between Wellcome Sanger Institute and deCODE Genetics. Bioinformatics was provided by Seven Bridges for samples sequenced at Wellcome Sanger Institute, and provided in-house for those sequenced at deCODE Genetics.
The whole genome sequencing data were quality controlled; direct output files from some quality confirmation steps are available in Category 180 and specific metrics in tabular format are made available in Category 187.
Joint variant calling has been performed on the first 150,000 Main Phase samples by deCODE Genetics, and subsequently repeated to include the 50,000 samples sequenced in the Vanguard Phase. Details of the joint variant calling and the main phase of the sequencing project are available in Halldorsson et al, Nature 607, 732-740 (2022)