Abstract
Aims: Timely, accurate assessment of electrocardiograms (ECGs) is crucial for diagnosing, triaging, and managing patients. However, this often relies on expert interpretation, a major bottleneck in low-resource settings. We developed and validated ECG-GPT, a format-independent vision encoder-decoder model that generates expert-level interpretations from 12-lead ECG images.</p>
Methods and results: We developed ECG-GPT using 12-lead ECGs and their corresponding diagnosis statements performed at a large US health system between 2000 and 2022. Using structured clinical assessment, semantic similarity, and conventional metrics, we validated ECG-GPT across seven distinct health settings, including three large and diverse US health systems, ECGs from Minas Gerais, Brazil, the UK Biobank, the Germany-based PTB-XL dataset, and a community hospital in Missouri. In total, 2.9 million ECGs were used for model development, and 4.1 million ECGs for validation. The model performed well in clinical assessment across 26 extracted labels, with diagnostic accuracy ranging from 0.93 to 0.99. For rhythm abnormalities, including atrial fibrillation, sinus tachycardia, sinus bradycardia, premature atrial contractions, and premature ventricular contractions, AUROCs ranged from 0.80 to 0.95. For conduction abnormalities, including left bundle branch block, right bundle branch block, first degree atrioventricular block, left anterior fascicular block, and left posterior fascicular block, AUROCs ranged from 0.88 to 0.96. ECG-GPT identified the full context of diagnosis statements with allied conditions with a median pairwise similarity of 0.90, significantly greater than baseline (P < 0.001). Results were comparable across external validation sites.</p>
Conclusion: We developed and validated a vision encoder-decoder model that generates expert-level interpretations from ECG images, a scalable strategy for accessible automated ECG analysis.</p>