Abstract
For a complex trait, heritability ([Formula: see text]) gives the genetic determination of its variation. Given the emergence of biobank-scale data, a more powerful method is needed to estimate [Formula: see text]. Based on the framework of Haseman-Elston regression (RHE-reg), we integrate a fast randomization algorithm to estimate [Formula: see text], and RHE-reg can tackle biobank-scale data, such as UK Biobank (UKB), very efficiently. Furthermore, we present an analytical solution that balances computational cost and precision of the estimation, a property that is important in dealing with biobank-scale data. We investigated the performance of the RHE-reg in simulated data and also applied it for 81 UKB quantitative traits; as tested in UKB data of nearly 300,000 unrelated individuals, it took on average about 4.5 hours to complete an estimation when used 10 CPUs. We extended the application of RHE-reg into distributed datasets when privacy is not compromised. As shown in UKB and simulated data the performance of RHE-reg was accurate in estimating [Formula: see text]. The software for estimating SNP-heritability for biobank-scale data is released.</p>