It’s a wrap: deriving distinct discoveries with FDR control after a GWAS pipeline

This article has 0 evaluations Published on
Read the full article Related papers
This article on Sciety

Abstract

The standard analysis pipeline for genome-wide association studies (GWAS) is based on marginal tests of association. These are computationally convenient and portable, but the discoveries resulting from their rejections are not immediately interpretable, and require post-processing as “clumping” and “fine mapping.” An interesting alternative is provided by conditional independence hypotheses: their rejections lead to the identification of distinct signals across the genome, accounting for measured confounders, and pointing to separate causal pathways.

An obstacle to the wide adoption of this approach has been that it requires access to individual level data. Overcoming this barrier, recent work has shown how summary statistics resulting from the standard marginal GWAS analysis can be used as input of a procedure to test conditional independence hypotheses while controlling the false discovery rate. This secondary analysis requires sampling of synthetic negative controls (knockoffs) from a distribution determined by the linkage disequilibrium patterns in the genome of the population under study. In prior work, we have pre-computed this distribution for European genomes, starting from information derived from the UK Biobank. Thus, researchers working with GWAS in a European population can carry out a knockoff analysis with minimal computational costs, using the distributed routine <monospace>GhostKnockoffGWAS</monospace> .

Here we introduce and release a new software ( <monospace>solveblock</monospace> ) that extends this capability to a much richer collection of studies. Given a set of genotyped samples, or a reference dataset, our pipeline efficiently estimates the high-dimensional correlation matrices that describe dependencies across the genome, making rather common sparsity assumptions. Taking this sample-specific estimate as input, the software identifies groups of genetic variants that are highly correlated, and uses them to define an appropriate resolution for conditional independence hypotheses. Finally, we compute the distribution for the exchangeable negative controls necessary to test these hypotheses. The output of <monospace>solveblock</monospace> can be passed directly to <monospace>GhostKnockoffGWAS</monospace> , allowing users to carry out the complete analysis in a two step procedure.

Simulations, based on five UK Biobank sub-populations, illustrate the method’s FDR control. The analysis of 26 phenotypes of varying polygenicity in British individuals, results in <monospace>≈</monospace> 19 additional discoveries, compared to standard marginal association testing. Our code, precompiled software, and processed files for these five sub-populations are openly shared.

Related articles

Related articles are currently not available for this article.