Abstract
BackgroundGenetic aberrations are among the critical driving factors of lung cancer. Importantly, the impact of genetic variations on proteomic dysregulations with the goal of characterizing potential diagnostic biomarkers at the population-level requires additional investigation. Modeling such proteogenomic interactions is crucial in understanding early-stage biological disruptions to inform biomarker discovery, successful clinical trials, and developing effective therapeutics.MethodsWe investigated two complementary aspects of lung cancer risk. First, we performed a genome-wide association study of lung cancer using population-scale datasets, then examined whether lung cancer risk-associated variants influence plasma protein levels using the UK Biobank Pharma Proteomics Project data. Second, we identified plasma proteomic dysregulations in presymptomatic and symptomatic patients with the objective of pinpointing diagnostic biomarkers through leveraging machine learning methods.ResultsUsing the identified proteins, machine learning models achieved median cross-validated AUCs of 0.85-0.88 (0-4 years before diagnosis [YBD]), 0.81-0.84 (5-9 YBD), and 0.80-0.86 (0-9 YBD). Performing survival analyses within the 5-9 YBD group, elevated levels of eight proteins, such as CALCB, PLAUR, and CD74, were found to significantly associate with lower survival. We identified 22 disease-associated proteins, of which 14 have been previously implicated in lung cancer, including CEACAM5, CXCL17, GDF15, WFDC2 along with 8 novel proteins. These proteins were enriched in pathways related to cytokine signaling, interleukin regulation, neutrophil degranulation, and lung fibrosis.ConclusionsWhile these findings do not establish mechanistic causality, they highlight proteomic alterations reflecting systemic changes preceding the diagnosis. Our study contributes to understanding genome-proteome relationships in lung cancer and identifies circulating proteins warranting further investigation as potential early biomarkers for screening and risk stratification.</p>