Dataset for a Multivariate Genome-Wide Association Study of Psycho-Cardiometabolic Multimorbidity

This dataset contains summary statistics from a multivariate genome-wide association study of major depression, coronary artery disease, and type 2 diabetes (i.e., psycho-cardiometabolic multimorbidity).

There are two sets of summary statistics: version with UK Biobank (effective sample size = 562,507) and version without UK Biobank (effective sample size = 156,717). Summary statistics for non-heterogeneous variants are also provided (obtained by removing variants with Q SNP P < 5e-8 and directionally discordant univariate effect estimates).

For a given single nucleotide polymorphism (SNP), the summary statistics include the chromosome; position; minor allele frequency; effect allele; non-effect allele; effect of the SNP on the common factor; standard error; ratio of effect/SE; p-value of Z_Estimate; chi-square statistic, providing the heterogeneity estimate for that SNP (equivalent to the Q SNP index); chi-square degrees of freedom; chi-square p-value; whether effect estimates from three input genome-wide association studies were directionally concordant.

Keywords:

Multimorbidity Coronary artery disease Type 2 diabetes Depression GWAS Summary statistics Psycho-cardiometabolic EarlyCause

Subjects:

Psychology

Psychology

Cite this dataset as:
Baltramonaityte, V., Pingault, J., Cecil, C., Choudhary, P., Järvelin, M., Penninx, B., Felix, J., Sebert, S., Milaneschi, Y., Walton, E., 2023. Dataset for a Multivariate Genome-Wide Association Study of Psycho-Cardiometabolic Multimorbidity. Bath: University of Bath Research Data Archive. Available from: https://doi.org/10.15125/BATH-01179.

Export

Data

top_10000_SNPs.zip
application/zip (1MB)
Creative Commons: Attribution-Share Alike 4.0

Summary statistics for the top 10,000 variants from the genome-wide association study of psycho-cardiometabolic multimorbidity (i.e., major depression, coronary artery disease, and type 2 diabetes). This folder contains two files: top 10,000 SNPs for the version with UK Biobank (PCM_multimorbidity_summary_stats_withUKBB_10000_SNPs.txt) and top 10,000 SNPs for the version without UK Biobank (PCM_multimorbidity_summary_stats_noUKBB_10000_SNPs.txt).

Code

Psycho-cardiome … scripts.zip
application/zip (9kB)
Creative Commons: Attribution-Share Alike 4.0

R scripts that underlie the analysis in the associated publication

Request Access
to restricted data

Mixed access regime: Due to licensing restrictions, only a subset of data is openly available from this repository. To access summary statistics for all SNPs associated with multimorbidity, a data transfer agreement is required with 23andMe (dataset-request@23andMe.com) before making a request from this repository.

GitHub repository containing the R scripts used

Creators

Vilte Baltramonaityte
University of Bath

0000-0002-9776-735X

Jean-Baptiste Pingault
University College London

0000-0003-2557-4716

Charlotte A.M. Cecil
Erasmus University Medical Center

0000-0002-2389-5922

Priyanka Choudhary
University of Oulu

0000-0002-2420-4118

Marjo-Riitta Järvelin
Imperial College London

0000-0002-2149-0630

Brenda W.J.H. Penninx
Vrije Universiteit Amsterdam

0000-0001-7779-9672

Janine F. Felix
Erasmus University Medical Center

0000-0002-9801-5774

Sylvain Sebert
University of Oulu

0000-0001-6681-6983

Yuri Milaneschi
Vrije Universiteit Amsterdam

0000-0002-3697-6617

Esther Walton
University of Bath

0000-0002-0935-2200

Contributors

23andMe
Rights Holder

University of Bath
Rights Holder

Documentation

Data collection method:

This dataset was created using genomic structural equation modeling (Genomic SEM) package in R. First, a common factor model was specified to capture the shared variance between major depression, coronary artery disease, and type 2 diabetes. The shared variance reflected the latent psycho-cardiometabolic multimorbidity factor. Subsequently, summary statistics for psycho-cardiometabolic multimorbidity were generated by regressing the latent factor on each SNP, resulting in 6,820,149 SNPs. Full details of the methodology may be found in the Methods section of the associated paper.

Technical details and requirements:

When using multimorbidity-associated variants as genetic instruments in Mendelian randomization analysis we recommend to only use non-heterogeneous SNPs. At the minimum, we recommend removing variants with QSNP P < 5e-8 and directionally discordant univariate effect estimates (option 1 below), as done in the non-heterogenous summary statistics versions (PCM_multimorbidity_summary_stats_noUKBB_non-het.txt.gz and PCM_multimorbidity_summary_stats_withUKBB_non-het.txt.gz). To ensure your analysis captures multimorbidity between coronary artery disease, type 2 diabetes and major depression, you may also apply a more stringent threshold for removing heterogeneous variants (e.g., QSNP P < 5e-6; option 2 below). A sample code for implementing this in R is included below: ``` # load tidyverse package library(tidyverse) # read in summary statistics sumstats <- read.delim("PCM_multimorbidity_summary_stats_withUKBB_non-het.txt.gz") # Option 1: remove discordant variants with QSNP P < 5e-8 data_not_concordant <- sumstats %>% filter(concordant == "FALSE") mm_snp_remove <- sumstats %>% filter(SNP %in% data_not_concordant$SNP) %>% filter(chisq_pval < 0.00000005) sumstas_nonhet <- sumstas %>% filter(!SNP %in% mm_snp_remove$SNP) # save file write.table(sumstas_nonhet, file = “newfile.txt", quote =F, sep = "\t", row.names = F) # Option 2: remove variants with QSNP P < 5e-6 sumstas_nonhet <- sumstas %>% filter(chisq_pval < 0.000005) # save file write.table(sumstas_nonhet, file = “newfile.txt", quote =F, sep = "\t", row.names = F) ```

Additional information:

For a detailed description of both sets of statistics, please see the associated publication. Summary statistics include: - SNP = single nucleotide polymorphism (SNP); - CHR = chromosome; - BP = position; - MAF = minor allele frequency; - A1 = effect allele; - A2 = non-effect allele; - BETA = effect of the SNP on the common factor; - SE = standard error; - Z_Estimate = ratio of effect/SE; - P = p-value of Z_Estimate; - chisq = chi-square statistic, providing the heterogeneity estimate for that SNP (equivalent to the Q SNP index); - chisq_df = chi-square degrees of freedom; - chisq_pval = chi-square p-value; - concordant = indicates whether effect estimates from three input genome-wide association studies were directionally concordant (TRUE = all three estimates were concordant; FALSE = at least one estimate was not concordant)

Methodology link:

Grotzinger, A. D., Nivard, M. G., and Tucker-Drob, E. M., 2022. GenomicSEM for Common Factor GWAS. In: GenomicSEM wiki. GitHub. Available from: https://github.com/GenomicSEM/GenomicSEM/wiki/4.-Common-Factor-GWAS.

Baltramonaityte, V., Pingault, J.-B., Cecil, C. A. M., Choudhary, P., Järvelin, M.-R., Penninx, B. W. J. H., Felix, J., Sebert, S., Milaneschi, Y., and Walton, E., 2023. A multivariate genome-wide association study of psycho-cardiometabolic multimorbidity. PLOS Genetics, 19(6), e1010508. Available from: https://doi.org/10.1371/journal.pgen.1010508.

Funders

Horizon 2020 Framework Programme
https://doi.org/10.13039/100010661

EarlyCause
848158

Publication details

Publication date: 30 June 2023
by: University of Bath

Version: 1

DOI: https://doi.org/10.15125/BATH-01179

URL for this record: https://researchdata.bath.ac.uk/1179

Contact information

Please contact the Research Data Service in the first instance for all matters concerning this item.

Contact person: Vilte Baltramonaityte

Departments:

Faculty of Humanities & Social Sciences
Psychology