| Title: | Computationally efficient whole-genome signal region detection for quantitative and binary traits |
| Journal: | The Annals of Applied Statistics |
| Published: | 1 Jun 2025 |
| DOI: | https://doi.org/10.1214/25-aoas2029 |
| Title: | Computationally efficient whole-genome signal region detection for quantitative and binary traits |
| Journal: | The Annals of Applied Statistics |
| Published: | 1 Jun 2025 |
| DOI: | https://doi.org/10.1214/25-aoas2029 |
WARNING: the interactive features of this website use CSS3, which your browser does not support. To use the full features of this website, please update your browser.
In this Supplementary Material, we present theoretical properties, the key lemmas and the proofs of the lemmas and theorems in our work. In addition, all codes and instructions for implementing simulations and real data analysis are included in the file "dBiRS_Code". The codes are also accessible in GitHub repository https://github.com/ZWCR7/dBiRS. The identification of genetic signal regions in the human genome is critical for understanding the genetic architecture of complex traits and diseases. Numerous methods based on scan algorithms (i.e., QSCAN, SCANG, SCANG-STAAR) have been developed to allow dynamic window sizes in whole-genome association studies. Beyond scan algorithms, we have recently developed the binary and research (BiRS) algorithm, which is more computationally efficient than scan-based methods and exhibits superior statistical power. However, the BiRS algorithm is based on two-sample mean test for binary traits, not accounting for multidimensional covariates or nonbinary outcomes. In this work we propose a new maximal score test based on summary statistics computed from a generalized linear model, which accommodates regression-based statistics and allows testing of both continuous and binary outcomes. We then present a distributed version of the BiRS algorithm (dBiRS) that incorporates this new test, enabling parallel computing of blockwise results by aggregation through a central machine to ensure both detection accuracy and computational efficiency, which has theoretical guarantees for controlling familywise error rates and false discovery rates while maintaining the power advantages of the original algorithm. Applying dBiRS to detect genetic regions associated with fluid intelligence and prospective memory using whole-exome sequencing data from the UK Biobank, we validate previous findings and identify numerous novel rare variants near newly implicated genes. These discoveries offer valuable insights into the genetic basis of cognitive performance and neurodegenerative disorders, highlighting the potential of dBiRS as a scalable and powerful tool for whole-genome signal region detection.</p>
| Application ID | Title |
|---|---|
| 79237 | Distribution and correlation free high-dimensional signal region detection with applications to whole genome association studies |
Enabling scientific discoveries that improve human health