Abstract
Most genome-wide association study (GWAS) analyses test the association between single-nucleotide polymorphisms (SNPs) and a single trait or outcome. While valuable second-step analyses of these associations (e.g., calculating genetic correlations between traits) are common, single-step multivariate analyses of GWAS data are rarely performed. This is unfortunate because multivariate analyses can reveal information which is irrevocably obscured in multi-step analysis. One simple example is the distinction between variance common to a set of measures, and variance specific to each. Neither GWAS of sum- or factor-scores, nor GWAS of the individual measures will deliver a clean picture of loci associated with each measure's specific variance. While multivariate GWAS opens up a broad new landscape of feasible and informative analyses, its adoption has been slow, likely due to the heavy computational demands and difficulties specifying models it requires. Here we describe GW-SEM 2.0, which is designed to simplify model specification and overcome the inherent computational challenges associated with multivariate GWAS. In addition, GW-SEM 2.0 allows users to accurately model ordinal items, which are common in behavioral and psychological research, within a GWAS context. This new release enhances computational efficiency, allows users to select the fit function that is appropriate for their analyses, expands compatibility with standard genomic data formats, and outputs results for seamless reading into other standard post-GWAS processing software. To demonstrate GW-SEM's utility, we conducted (1) a series of GWAS using three substance use frequency items from data in the UK Biobank, (2) a timing study for several predefined GWAS functions, and (3) a Type I Error rate study. Our multivariate GWAS analyses emphasize the utility of GW-SEM for identifying novel patterns of associations that vary considerably between genomic loci for specific substances, highlighting the importance of differentiating between substance-specific use behaviors and polysubstance use. The timing studies demonstrate that the analyses take a reasonable amount of time and show the cost of including additional items. The Type I Error rate study demonstrates that hypothesis tests for genetic associations with latent variable models follow the hypothesized uniform distribution. Taken together, we suggest that GW-SEM may provide substantially deeper insights into the underlying genomic architecture for multivariate behavioral and psychological systems than is currently possible with standard GWAS methods. The current release of GW-SEM 2.0 is available on CRAN (stable release) and GitHub (beta release), and tutorials are available on our github wiki (https://jpritikin.github.io/gwsem/).</p>