Comparing different methods of estimating GWAS heritability with a new approach using only summary statistics
Abstract
So far SNP heritability ( <inline-formula> <inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="560406v1_inline1.gif"/> </inline-formula> ;variance explained by all SNP s used in genome-wide association study) has explained most of genetic variation for many traits but still there is a gap between GWAS heritability ( <inline-formula> <inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="560406v1_inline2.gif"/> </inline-formula> ; variance explained by genome-wide significant SNPs) and <inline-formula> <inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="560406v1_inline3.gif"/> </inline-formula> that is named hidden heritability.
There are several methods for estimating <inline-formula> <inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="560406v1_inline4.gif"/> </inline-formula> (linear_mixed_model (LMM), PRS, multiple_linear_regression (MLR) and simple_linear_regression(SLR)). However, it is unclear which methods are more accurate under different circumstances. This study proposes a PRS based method for estimating <inline-formula> <inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="560406v1_inline5.gif"/> </inline-formula> that uses pseudo summary statistics. It compares this method with existing methods using both simulated and real data (10 traits from UKBB) to determine when they are realistic and can be trusted as a final estimate.
Simulation results showed that PRS-based methods underestimate <inline-formula> <inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="560406v1_inline6.gif"/> </inline-formula> near 20% when considering all causal SNPs. But they are relatively accurate when using a subset of causal SNPs. Their performance is much better than SLR method for all 10 traits, although when applied to real data, they do not follow a stable trend of overestimation or underestimation compared to the base model (LMM).
My suggestion is to use LMM or adjusted_R 2 from MLR for reporting <inline-formula> <inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="560406v1_inline7.gif"/> </inline-formula> when an independent data set is available. In cases where only summary statistics is available, the PRS-PSS is relatively an accurate alternative, especially compared to SLR, which tends to overestimate <inline-formula> <inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="560406v1_inline8.gif"/> </inline-formula> by 20-50% when applying it on real data.
Related articles
Related articles are currently not available for this article.