Genomic privacy risks in GWAS summary statistics
Abstract
The rapid advancement in sequencing technologies has exponentially increased the availability of genomic data, heightening concerns about data privacy. Despite the perceived safety of publicly accessible genome-wide association study (GWAS) summary statistics, we demonstrate that their combination with less sensitive high-dimensional phenotype data can lead to significant leakage of confidential genomic information. By transforming a linear regression model into linear programming constraints, we scrutinize the potential for genomic data recovery using GWAS summary statistics. We found that an effective phenotype-to-sample size ratio above 0.85 could enable full genotype recovery, and that above 0.16 was sufficient to enable individual identification. Certain non-European populations are especially vulnerable. The results stress the urgent need for stronger privacy protections in genomic research while maintaining data utility.
Related articles
Related articles are currently not available for this article.