Abstract
In the last two decades of Genome-wide association studies (GWAS), nicotine-dependence-related genetic loci (e.g., nicotinic acetylcholine receptor - nAChR subunit genes) are among the most replicable genetic findings. Although GWAS results have reported tens of thousands of SNPs within these loci, further analysis (e.g., fine-mapping) is required to identify the causal variants. However, it is computationally challenging for existing fine-mapping methods to reliably identify causal variants from thousands of candidate SNPs based on the posterior inclusion probability. To address this challenge, we propose a new method to select SNPs by jointly modeling the SNP-wise inference results and the underlying structured network patterns of the linkage disequilibrium (LD) matrix. We use adaptive dense subgraph extraction method to recognize the latent network patterns of the LD matrix and then apply group LASSO to select causal variant candidates. We applied this new method to the UK biobank data to identify the causal variant candidates for nicotine addiction. Eighty-one nicotine addiction-related SNPs (i.e.,-log(p) > 50) of nAChR were selected, which are highly correlated (average r2>0.8) although they are physically distant (e.g., >200 kilobase away) and from various genes. These findings revealed that distant SNPs from different genes can show higher LD r2 than their neighboring SNPs, and jointly contribute to a complex trait like nicotine addiction.</p>