Abstract
Genome-wide association studies have identified thousands of genetic variants associated with non-small cell lung cancer (NSCLC), however, it is still challenging to determine the causal variants and to improve disease risk prediction. Here, we applied massively parallel reporter assays to perform NSCLC variant-to-function mapping at scale. A total of 1249 candidate variants were evaluated, and 30 potential causal variants within 12 loci were identified. Accordingly, we proposed three genetic architectures underlying NSCLC susceptibility: multiple causal variants in a single haplotype block (e.g. 4q22.1), multiple causal variants in multiple haplotype blocks (e.g. 5p15.33), and a single causal variant (e.g. 20q11.23). We developed a modified polygenic risk score using the potential causal variants from Chinese populations, improving the performance of risk prediction in 450,821 Europeans from the UK Biobank. Our findings not only augment the understanding of the genetic architecture underlying NSCLC susceptibility but also provide strategy to advance NSCLC risk stratification.</p>