Abstract
Several large-scale Illumina whole-genome sequencing (WGS) and whole-exome sequencing (WES) projects have emerged recently that have provided exceptional opportunities to discover mobile element insertions (MEIs) and study the impact of these MEIs on human genomes. However, these projects also have presented major challenges with respect to the scalability and computational costs associated with performing MEI discovery on tens or even hundreds of thousands of samples. To meet these challenges, we have developed a more efficient and scalable version of our mobile element locator tool (MELT) called CloudMELT. We then used MELT and CloudMELT to perform MEI discovery in 57,919 human genomes and exomes, leading to the discovery of 104,350 nonredundant MEIs. We leveraged this collection (1) to examine potentially active L1 source elements that drive the mobilization of new Alu, L1, and SVA MEIs in humans; (2) to examine the population distributions and subfamilies of these MEIs; and (3) to examine the mutagenesis of GENCODE genes, ENCODE-annotated features, and disease genes by these MEIs. Our study provides new insights on the L1 source elements that drive MEI mutagenesis and brings forth a better understanding of how this mutagenesis impacts human genomes.</p>