Abstract
Conventional approaches to analyzing electrocardiograms (ECG) in discrete parameters (such as the PR interval) ignored the high dimensionality of data omitted subtle but relevant information. We applied a variational auto-encoder to learn the underlying distributions of the ECG of 41,927 UK Biobank participants, generating 32-dimensional representation (latent factors). The latent factors showed correlations to conventional ECG parameters and strong associations to cardiac phenotypes estimated from magnetic resonance imaging. We found definitive associations of the latent factors to conduction, rhythm, and structural disorders (all p < 4.51 × 10-308) and additionally value in mortality prediction. Genome wide association study (GWAS) of the latent factors, revealed 170 genetic loci with 29 not previously associated with electrocardiographic phenotypes. Further characterization of the genetic signals suggested involvement in cardiac development, contractility, and electrophysiology. Our results supported that the deep representation learning of 12-lead ECG could provide clinically meaningful and interpretable insights into cardiovascular biology and health.</p>