Dataset for a Multivariate Genome-Wide Association Study of Psycho-Cardiometabolic Multimorbidity
This dataset contains summary statistics from a multivariate genome-wide association study of major depression, coronary artery disease, and type 2 diabetes (i.e., psycho-cardiometabolic multimorbidity).
There are two sets of summary statistics: version with UK Biobank (effective sample size = 562,507) and version without UK Biobank (effective sample size = 156,717). Summary statistics for non-heterogeneous variants are also provided (obtained by removing variants with Q SNP P < 5e-8 and directionally discordant univariate effect estimates).
For a given single nucleotide polymorphism (SNP), the summary statistics include the chromosome; position; minor allele frequency; effect allele; non-effect allele; effect of the SNP on the common factor; standard error; ratio of effect/SE; p-value of Z_Estimate; chi-square statistic, providing the heterogeneity estimate for that SNP (equivalent to the Q SNP index); chi-square degrees of freedom; chi-square p-value; whether effect estimates from three input genome-wide association studies were directionally concordant.
Cite this dataset as:
Baltramonaityte, V.,
Pingault, J.,
Cecil, C.,
Choudhary, P.,
Järvelin, M.,
Penninx, B.,
Felix, J.,
Sebert, S.,
Milaneschi, Y.,
Walton, E.,
2023.
Dataset for a Multivariate Genome-Wide Association Study of Psycho-Cardiometabolic Multimorbidity.
Bath: University of Bath Research Data Archive.
Available from: https://doi.org/10.15125/BATH-01179.
Export
Data
top_10000_SNPs.zip
application/zip (1MB)
Creative Commons: Attribution-Share Alike 4.0
Summary statistics for the top 10,000 variants from the genome-wide association study of psycho-cardiometabolic multimorbidity (i.e., major depression, coronary artery disease, and type 2 diabetes). This folder contains two files: top 10,000 SNPs for the version with UK Biobank (PCM_multimorbidity_summary_stats_withUKBB_10000_SNPs.txt) and top 10,000 SNPs for the version without UK Biobank (PCM_multimorbidity_summary_stats_noUKBB_10000_SNPs.txt).
Code
Psycho-cardiome … scripts.zip
application/zip (9kB)
Creative Commons: Attribution-Share Alike 4.0
R scripts that underlie the analysis in the associated publication
Mixed access regime: Due to licensing restrictions, only a subset of data is openly available from this repository. To access summary statistics for all SNPs associated with multimorbidity, a data transfer agreement is required with 23andMe (dataset-request@23andMe.com) before making a request from this repository.
Creators
Vilte Baltramonaityte
University of Bath
Jean-Baptiste Pingault
University College London
Charlotte A.M. Cecil
Erasmus University Medical Center
Priyanka Choudhary
University of Oulu
Marjo-Riitta Järvelin
Imperial College London
Brenda W.J.H. Penninx
Vrije Universiteit Amsterdam
Janine F. Felix
Erasmus University Medical Center
Sylvain Sebert
University of Oulu
Yuri Milaneschi
Vrije Universiteit Amsterdam
Esther Walton
University of Bath
Contributors
23andMe
Rights Holder
University of Bath
Rights Holder
Documentation
Data collection method:
This dataset was created using genomic structural equation modeling (Genomic SEM) package in R. First, a common factor model was specified to capture the shared variance between major depression, coronary artery disease, and type 2 diabetes. The shared variance reflected the latent psycho-cardiometabolic multimorbidity factor. Subsequently, summary statistics for psycho-cardiometabolic multimorbidity were generated by regressing the latent factor on each SNP, resulting in 6,820,149 SNPs. Full details of the methodology may be found in the Methods section of the associated paper.
Technical details and requirements:
When using multimorbidity-associated variants as genetic instruments in Mendelian randomization analysis we recommend to only use non-heterogeneous SNPs. At the minimum, we recommend removing variants with QSNP P < 5e-8 and directionally discordant univariate effect estimates (option 1 below), as done in the non-heterogenous summary statistics versions (PCM_multimorbidity_summary_stats_noUKBB_non-het.txt.gz and PCM_multimorbidity_summary_stats_withUKBB_non-het.txt.gz). To ensure your analysis captures multimorbidity between coronary artery disease, type 2 diabetes and major depression, you may also apply a more stringent threshold for removing heterogeneous variants (e.g., QSNP P < 5e-6; option 2 below). A sample code for implementing this in R is included below: ``` # load tidyverse package library(tidyverse) # read in summary statistics sumstats <- read.delim("PCM_multimorbidity_summary_stats_withUKBB_non-het.txt.gz") # Option 1: remove discordant variants with QSNP P < 5e-8 data_not_concordant <- sumstats %>% filter(concordant == "FALSE") mm_snp_remove <- sumstats %>% filter(SNP %in% data_not_concordant$SNP) %>% filter(chisq_pval < 0.00000005) sumstas_nonhet <- sumstas %>% filter(!SNP %in% mm_snp_remove$SNP) # save file write.table(sumstas_nonhet, file = “newfile.txt", quote =F, sep = "\t", row.names = F) # Option 2: remove variants with QSNP P < 5e-6 sumstas_nonhet <- sumstas %>% filter(chisq_pval < 0.000005) # save file write.table(sumstas_nonhet, file = “newfile.txt", quote =F, sep = "\t", row.names = F) ```
Additional information:
For a detailed description of both sets of statistics, please see the associated publication. Summary statistics include: - SNP = single nucleotide polymorphism (SNP); - CHR = chromosome; - BP = position; - MAF = minor allele frequency; - A1 = effect allele; - A2 = non-effect allele; - BETA = effect of the SNP on the common factor; - SE = standard error; - Z_Estimate = ratio of effect/SE; - P = p-value of Z_Estimate; - chisq = chi-square statistic, providing the heterogeneity estimate for that SNP (equivalent to the Q SNP index); - chisq_df = chi-square degrees of freedom; - chisq_pval = chi-square p-value; - concordant = indicates whether effect estimates from three input genome-wide association studies were directionally concordant (TRUE = all three estimates were concordant; FALSE = at least one estimate was not concordant)
Methodology link:
Grotzinger, A. D., Nivard, M. G., and Tucker-Drob, E. M., 2022. GenomicSEM for Common Factor GWAS. In: GenomicSEM wiki. GitHub. Available from: https://github.com/GenomicSEM/GenomicSEM/wiki/4.-Common-Factor-GWAS.
Baltramonaityte, V., Pingault, J.-B., Cecil, C. A. M., Choudhary, P., Järvelin, M.-R., Penninx, B. W. J. H., Felix, J., Sebert, S., Milaneschi, Y., and Walton, E., 2023. A multivariate genome-wide association study of psycho-cardiometabolic multimorbidity. PLOS Genetics, 19(6), e1010508. Available from: https://doi.org/10.1371/journal.pgen.1010508.
Funders
Horizon 2020 Framework Programme
https://doi.org/10.13039/100010661
EarlyCause
848158
Publication details
Publication date: 30 June 2023
by: University of Bath
Version: 1
DOI: https://doi.org/10.15125/BATH-01179
URL for this record: https://researchdata.bath.ac.uk/id/eprint/1179
Related papers and books
Baltramonaityte, V., Pingault, J.-B., Cecil, C. A. M., Choudhary, P., Järvelin, M.-R., Penninx, B. W. J. H., Felix, J., Sebert, S., Milaneschi, Y., and Walton, E., 2023. A multivariate genome-wide association study of psycho-cardiometabolic multimorbidity. PLOS Genetics, 19(6), e1010508. Available from: https://doi.org/10.1371/journal.pgen.1010508.
Contact information
Please contact the Research Data Service in the first instance for all matters concerning this item.
Contact person: Vilte Baltramonaityte
Faculty of Humanities & Social Sciences
Psychology