Title: | Hierarchical Joint Analysis of Marginal Summary Statistics |
---|---|
Description: | Provides functions to implement a hierarchical approach which is designed to perform joint analysis of summary statistics using the framework of Mendelian Randomization or transcriptome analysis. Reference: Lai Jiang, Shujing Xu, Nicholas Mancuso, Paul J. Newcombe, David V. Conti (2020). "A Hierarchical Approach Using Marginal Summary Statistics for Multiple Intermediates in a Mendelian Randomization or Transcriptome Analysis." <bioRxiv><doi:10.1101/2020.02.03.924241>. |
Authors: | Lai Jiang <[email protected]> |
Maintainer: | Lai Jiang <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.1 |
Built: | 2025-01-13 05:35:47 UTC |
Source: | https://github.com/uscbiostats/hjam |
Function to implement regularized hJAM, including elastic net hJAM and lasso hJAM.
EN.hJAM( betas.Gy, N.Gy, eaf.Gy = NULL, Geno, A, tune_glmnet = 0.5, ridgeTerm = FALSE )
EN.hJAM( betas.Gy, N.Gy, eaf.Gy = NULL, Geno, A, tune_glmnet = 0.5, ridgeTerm = FALSE )
betas.Gy |
The betas in the paper: the marginal effects of SNPs on the phenotype (Gy) |
N.Gy |
The sample size of the GWAS where you obtain the betas.Gy and betas_se.Gy |
eaf.Gy |
The effect allele frequency of the SNPs in betas.Gy |
Geno |
The individual level data of the reference panel. Must have the same order of SNPs as in the betas.Gy. |
A |
The conditional A matrix. |
tune_glmnet |
The |
ridgeTerm |
Add a small elelment to the diagnoal of X'X to make the matrix invertable. |
An object of the Regularized hJAM
The number of SNPs that the user use in the instrument set.
The number of selected intermediates, regardless of the credible sets.
The label/name for each selected intermediates.
The coefficients of selected intermediates. Otherwise will be zero.
Lai Jiang
data(ENhJAM.SimulationSet) EN.hJAM(betas.Gy = Simulation.betas.gwas, N.Gy = 5000, eaf.Gy = Simulation.maf.gwas, Geno = Simulation.Geno, A = Simulation.Amatrix, ridgeTerm = FALSE)
data(ENhJAM.SimulationSet) EN.hJAM(betas.Gy = Simulation.betas.gwas, N.Gy = 5000, eaf.Gy = Simulation.maf.gwas, Geno = Simulation.Geno, A = Simulation.Amatrix, ridgeTerm = FALSE)
Simulation data for EN-hJAM
The ENhJAM.SimulationSet
is a set of simulation data sets for the example of elastic net hJAM
The conditional matrix with 118 metabolites and 144 SNPs, which was composed by SuSiE JAM and the marginal
matrix.
The reference genotype data for the 144 SNPs from the European-ancestry population in 1000 Genome Project (Consortium, 2015).
The b vector. The association estimates between selected SNPs and the risk of prostate cancer from (Schumacher et al., 2018)
The se(b) vector from (Schumacher et al., 2018)
The vector of the effect allele frequency of the SNPs from (Schumacher et al., 2018)
To calculate sufficient statistics based on summary statistics
To calculate sufficient statistics based on summary statistics
get_XtX(N_outcome, Gl, maf) get_XtX(N_outcome, Gl, maf)
get_XtX(N_outcome, Gl, maf) get_XtX(N_outcome, Gl, maf)
N_outcome |
Sample size in the GWAS where we obtained 'betas' |
Gl |
A matrix of reference dosage, columns are SNPs and rows are individuals. |
maf |
A vector of minor allele frequencies |
a variance covariance matrix of scaled Gl
a variance covariance matrix of scaled Gl
To calculate sufficient statistics based on summary statistics. This yty estimate follows Yang et al. (2012) Nat Gen. Marginal estimates from one SNP will produce one yty estimates. Yang suggests taking the median across all SNPs to obtain a robust estimate. Here we record all yty estimates and output both the median and the entire vector.
To calculate sufficient statistics based on summary statistics. This yty estimate follows Yang et al. (2012) Nat Gen. Marginal estimates from one SNP will produce one yty estimates. Yang suggests taking the median across all SNPs to obtain a robust estimate. Here we record all yty estimates and output both the median and the entire vector.
get_yty(maf, N_outcome, betas, betas.se) get_yty(maf, N_outcome, betas, betas.se)
get_yty(maf, N_outcome, betas, betas.se) get_yty(maf, N_outcome, betas, betas.se)
maf |
A vector of minor allele frequencies |
N_outcome |
Sample size in the GWAS where we obtained 'betas' |
betas |
A vector of marginal estimates of effect sizes (betas for continuous outcome; logOR for binary outcome) |
betas.se |
A vector of the standard errors of marginal effect estimates ('betas'). |
median of yty estimates across all SNPs; and a vector of all yty estimates
median of yty estimates across all SNPs; and a vector of all yty estimates
To calculate sufficient statistics based on summary statistics
To calculate sufficient statistics based on summary statistics
get_z(maf, betas, N_outcome) get_z(maf, betas, N_outcome)
get_z(maf, betas, N_outcome) get_z(maf, betas, N_outcome)
maf |
A vector of minor allele frequencies |
betas |
A vector of marginal estimates of effect sizes (betas for continuous outcome; logOR for binary outcome) |
N_outcome |
Sample size in the GWAS where we obtained 'betas' |
a numeric vector of calculated z statistic
a numeric vector of calculated z statistic
Real data for selecting the genes on chromosome 10 for the risk of prostate cancer
The GTEx.PrCa
is a set of data sets which was applied for selecting the genes on chromosome 10 for the risk of prostate cancer
The marginal matrix with 158 genes and 182 eQTLs. The raw data was downloaded from GTEx analysis v7 (https://gtexportal.org/home/datasets). Priority Pruner was used to select the independent eQTLs. We used this matrix for MR-BMA implementation.)
The marginal matrix with 167 genes and 447 eQTLs. The raw data was downloaded from GTEx analysis v7 (https://gtexportal.org/home/datasets). This is the raw
matrix for constructing the conditional weight matrix for SHA-JAM analysis.
The standard errors of the marginal effects for the SNP-gene pairs (167 genes, 447 eQTLs). The raw data was downloaded from GTEx analysis v7 (https://gtexportal.org/home/datasets).
The inclusion indicator for the significant SNP-gene pairs (167 genes, 447 eQTLs). Significant as 1; otherwise 0. This matrix is for composing the conditional weight matrix using the raw data.
The conditional matrix with 167 genes and 447 eQTLs, which was composed by SuSiE JAM and the raw data of
matrix.
The reference genotype data for the 447 eQTLs from the European-ancestry population in 1000 Genome Project (Consortium, 2015)
The b vector. The association estimates between eQTLs and the risk of prostate cancer from (Schumacher et al., 2018)
The se(b) vector from (Schumacher et al., 2018)
The pvalues vector of the association estimates between selected SNPs and the risk of prostate cancer from (Schumacher et al., 2018)
The vector of the effect allele frequency of the SNPs from (Schumacher et al., 2018)
Consortium GP. A global reference for human genetic variation. Nature 2015; 526: 68.
Lonsdale, John, et al. The genotype-tissue expression (GTEx) project. Nature genetics 45.6 (2013): 580-585.
Schumacher, Fredrick R., et al. Association analyses of more than 140,000 men identify 63 new prostate cancer susceptibility loci. Nature genetics 50.7 (2018): 928-936.
The hJAM function is to get the results from the hJAM model using input data
hJAM(betas.Gy, N.Gy, Geno, A, ridgeTerm = FALSE)
hJAM(betas.Gy, N.Gy, Geno, A, ridgeTerm = FALSE)
betas.Gy |
The betas in the paper: the marginal effects of SNPs on the phenotype (Gy) |
N.Gy |
The sample size of Gy |
Geno |
The reference panel (Geno), such as 1000 Genome |
A |
The A matrix in the paper: the marginal/conditional effects of SNPs on the exposures (Gx) |
ridgeTerm |
ridgeTerm = TRUE when the matrix L is singular. Matrix L is obtained from the cholesky decomposition of G0'G0. Default as FALSE. |
An object of the hJAM with linear regression results.
The intermediates, such as the modifiable risk factors in Mendelian Randomization and gene expression in transcriptome analysis.
The number of SNPs that the user use in the instrument set.
The conditional estimates of the associations between intermediates and the outcome.
The standard error of the conditional estimates of the associations between intermediates and the outcome.
The lower bound of the 95% confidence interval of the estimates.
The upper bound of the 95% confidence interval of the estimates.
The p value of the estimates with a type-I error equals 0.05.
Lai Jiang
Lai Jiang, Shujing Xu, Nicholas Mancuso, Paul J. Newcombe, David V. Conti (2020). A Hierarchical Approach Using Marginal Summary Statistics for Multiple Intermediates in a Mendelian Randomization or Transcriptome Analysis. bioRxiv https://doi.org/10.1101/2020.02.03.924241.
data(MI) hJAM(betas.Gy = MI.betas.gwas, Geno = MI.Geno, N.Gy = 459324, A = MI.Amatrix, ridgeTerm = TRUE)
data(MI) hJAM(betas.Gy = MI.betas.gwas, Geno = MI.Geno, N.Gy = 459324, A = MI.Amatrix, ridgeTerm = TRUE)
The hJAM_egger function is to get the results from the hJAM model with Egger regression. It is for detecting potential pleiotropy
hJAM_egger(betas.Gy, N.Gy, Geno, A, ridgeTerm = TRUE)
hJAM_egger(betas.Gy, N.Gy, Geno, A, ridgeTerm = TRUE)
betas.Gy |
The betas in the paper: the marginal effects of SNPs on the phenotype (Gy) |
N.Gy |
The sample size of Gy |
Geno |
The reference panel (Geno), such as 1000 Genome |
A |
The A matrix in the paper: the marginal/conditional effects of SNPs on the exposures (Gx) |
ridgeTerm |
ridgeTerm = TRUE when the matrix L is singular. Matrix L is obtained from the cholesky decomposition of G0'G0. Default as TRUE |
An object of the hJAM with egger regression results.
The intermediates, such as the modifiable risk factors in Mendelian Randomization and gene expression in transcriptome analysis.
The number of SNPs that the user use in the instrument set.
The conditional estimates of the associations between intermediates and the outcome.
The standard error of the conditional estimates of the associations between intermediates and the outcome.
The lower bound of the 95% confidence interval of the estimates.
The upper bound of the 95% confidence interval of the estimates.
The p value of the estimates with a type-I error equals 0.05.
The intercept of the regression of intermediates on the outcome.
The standard error of the intercept of the regression of intermediates on the outcome.
The lower bound of the 95% confidence interval of the intercept.
The upper bound of the 95% confidence interval of the intercept.
The p value of the intercept with a type-I error equals 0.05.
An object of hJAM with egger regression results.
Lai Jiang
Lai Jiang, Shujing Xu, Nicholas Mancuso, Paul J. Newcombe, David V. Conti (2020). A Hierarchical Approach Using Marginal Summary Statistics for Multiple Intermediates in a Mendelian Randomization or Transcriptome Analysis. bioRxiv https://doi.org/10.1101/2020.02.03.924241.
data(MI) hJAM_egger(betas.Gy = MI.betas.gwas, Geno = MI.Geno, N.Gy = 459324, A = MI.Amatrix)
data(MI) hJAM_egger(betas.Gy = MI.betas.gwas, Geno = MI.Geno, N.Gy = 459324, A = MI.Amatrix)
The JAM_A
function is to get the conditional A matrix by using marginal A matrix
JAM_A(marginalA, Geno, N.Gx, eaf_Gx = NULL, ridgeTerm = TRUE)
JAM_A(marginalA, Geno, N.Gx, eaf_Gx = NULL, ridgeTerm = TRUE)
marginalA |
the marginal effects of SNPs on the exposures (Gx). |
Geno |
the reference panel (Geno), such as 1000 Genome |
N.Gx |
the sample size of each Gx. It can be a scalar or a vector. If there are multiple X's from different Gx, it should be a vector including the sample size of each Gx. If all alphas are from the same Gx, it could be a scalar. |
eaf_Gx |
the effect allele frequency of the SNPs in the Gx data. |
ridgeTerm |
ridgeTerm = TRUE when the matrix L is singular. Matrix L is obtained from the cholesky decomposition of G0'G0. Default as TRUE. |
A matrix with conditional estimates which are converted from marginal estimates using the JAM model.
Lai Jiang
data(MI) JAM_A(marginalA = MI.marginal.Amatrix, Geno = MI.Geno, N.Gx = c(339224, 659316), ridgeTerm = TRUE) JAM_A(marginalA = MI.marginal.Amatrix, Geno = MI.Geno, N.Gx = c(339224, 659316), eaf_Gx = MI.SNPs_info$ref_frq)
data(MI) JAM_A(marginalA = MI.marginal.Amatrix, Geno = MI.Geno, N.Gx = c(339224, 659316), ridgeTerm = TRUE) JAM_A(marginalA = MI.marginal.Amatrix, Geno = MI.Geno, N.Gx = c(339224, 659316), eaf_Gx = MI.SNPs_info$ref_frq)
The JAM_alphas
function is to compute the conditional alpha vector for each X
If only one X in the model, please use JAM_alphas instead of JAM_A
A sub-step in the JAM_A function
JAM_alphas(marginalA, Geno, N.Gx, eaf_Gx = NULL, ridgeTerm = TRUE)
JAM_alphas(marginalA, Geno, N.Gx, eaf_Gx = NULL, ridgeTerm = TRUE)
marginalA |
the marginal effects of SNPs on one exposure (Gx). |
Geno |
the reference panel (Geno), such as 1000 Genome |
N.Gx |
the sample size of the Gx. It can be a scalar. |
eaf_Gx |
the effect allele frequency of the SNPs in the Gx data. |
ridgeTerm |
ridgeTerm = TRUE when the matrix L is singular. Matrix L is obtained from the cholesky decomposition of G0'G0. Default as TRUE. |
A vector with conditional estimates which are converted from marginal estimates using the JAM model.
Lai Jiang
Lai Jiang, Shujing Xu, Nicholas Mancuso, Paul J. Newcombe, David V. Conti (2020). A Hierarchical Approach Using Marginal Summary Statistics for Multiple Intermediates in a Mendelian Randomization or Transcriptome Analysis. bioRxiv https://doi.org/10.1101/2020.02.03.924241.
data(MI) JAM_alphas(marginalA = MI.marginal.Amatrix[, 1], Geno = MI.Geno, N.Gx = 339224) JAM_alphas(marginalA = MI.marginal.Amatrix[, 1], Geno = MI.Geno, N.Gx = 339224, eaf_Gx = MI.SNPs_info$ref_frq)
data(MI) JAM_alphas(marginalA = MI.marginal.Amatrix[, 1], Geno = MI.Geno, N.Gx = 339224) JAM_alphas(marginalA = MI.marginal.Amatrix[, 1], Geno = MI.Geno, N.Gx = 339224, eaf_Gx = MI.SNPs_info$ref_frq)
Adopted from R2BGLiMS::JAM_LogisticToLinearEffects. Reference: Benner 2015, FINEMAP
Adopted from R2BGLiMS::JAM_LogisticToLinearEffects. Reference: Benner 2015, FINEMAP
LogisticToLinearEffects( log.ors = NULL, log.or.ses = NULL, snp.genotype.sds = NULL, mafs = NULL, n = NULL, p.cases = NULL ) LogisticToLinearEffects( log.ors = NULL, log.or.ses = NULL, snp.genotype.sds = NULL, mafs = NULL, n = NULL, p.cases = NULL )
LogisticToLinearEffects( log.ors = NULL, log.or.ses = NULL, snp.genotype.sds = NULL, mafs = NULL, n = NULL, p.cases = NULL ) LogisticToLinearEffects( log.ors = NULL, log.or.ses = NULL, snp.genotype.sds = NULL, mafs = NULL, n = NULL, p.cases = NULL )
log.ors |
A vector of log odds ratios |
log.or.ses |
A vector of the standard errors of the log ORs |
snp.genotype.sds |
A vector of standard deviations of genotypes (optional if 'mafs' is provided) |
mafs |
A vector of effective allele frequencies (optional if 'snp.genotype.sds' is provided) |
n |
Sample size in the GWAS where we obtained 'log.ors' |
p.cases |
A numeric value of the proportion of cases in the GWAS. |
Transformed linear effect estimates, and transformed standards errors of linear effects.
Transformed linear effect estimates, and transformed standards errors of linear effects.
Real data for BMI/T2D on the risk of Myocardial infarction
The MI
object is a set of data sets which was used to estimate the causal effect of body mass index and type 2 diabetes on the risk of myocardial infarction.
The marginal matrix. Column one and two are the marginal estimates of the SNPs on body mass index from GIANT consortium (n = 339,224) (Locke et al., 2015) and type 2 diabetes from DIAGRAM+GERA+UKB (n = 659,316) (Xue et al., 2018), respectively
The conditional matrix composed by JAM and the marginal
matrix. Column one and two are the conditional effect estimates of the SNPs on body mass index and type 2 diabetes, respectively.
The reference genotype data from the European-ancestry population in 1000 Genome Project (Consortium, 2015).
The b vector. The association estimates between selected SNPs and the risk of myocardial infarction from UK Biobank (Sudlow et al., 2015).
The SNP information. Five columns included: the RSID, reference allele, reference allele frequency, if BMI significant and if T2D significant. The last two columns are indicator variables for the SNPs which are genome-wide significant associated with BMI/T2D.
Consortium GP. A global reference for human genetic variation. Nature 2015; 526: 68.
Locke, Adam E., et al. Genetic studies of body mass index yield new insights for obesity biology. Nature 518.7538 (2015): 197-206.
Xue, Angli, et al. Genome-wide association analyses identify 143 risk variants and putative regulatory mechanisms for type 2 diabetes. Nature communications 9.1 (2018): 1-14.
Sudlow, Cathie, et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. Plos med 12.3 (2015): e1001779.
Construct mJAM credible set based for selected index SNP
mJAM_build_CS( X_id, prev_X_list = NULL, All_id, PrCS_weights = "Pr(M_C)", coverage = 0.95, GItGI_curr, GIty_curr, yty_curr, yty_med, N_GWAS, rare_SNPs = NULL, Pr_Med_cut = 0.1, use_robust_var_est = FALSE )
mJAM_build_CS( X_id, prev_X_list = NULL, All_id, PrCS_weights = "Pr(M_C)", coverage = 0.95, GItGI_curr, GIty_curr, yty_curr, yty_med, N_GWAS, rare_SNPs = NULL, Pr_Med_cut = 0.1, use_robust_var_est = FALSE )
X_id |
A character specifying the ID of the index SNP; should be found in 'All_id'. |
prev_X_list |
A list of character vector of the ID(s) of previously selected index SNP(s). |
All_id |
A list of character vector of the ID(s) of all SNP(s) remaining in the analysis, including all previously selected SNP(s) and the current index SNP. |
PrCS_weights |
An option to specify what weights to apply on Pr(Med). Default is "Pr(M_C)". |
coverage |
A number between 0 and 1 specifying the “coverage” of the estimated confidence sets. |
GItGI_curr |
A list of GItGI statistics at the current stage (after pruning out SNPs correlated with previously selected index SNPs). |
GIty_curr |
A list of GIty estimates of all remaining SNPs at the current stage (after pruning out SNPs correlated with previously selected index SNPs). |
yty_curr |
A list of yty estimates of all remaining SNPs at the current stage (after pruning out SNPs correlated with previously selected index SNPs). |
yty_med |
A list of median yty across all SNPs. |
N_GWAS |
A vector of sample sizes in all original GWAS studies. |
rare_SNPs |
A numeric vector of ID(s) for rare SNP(s) which we do not apply weighting. Instead, we use the individual estimate of yty for these SNPs for robustness. |
Pr_Med_cut |
The cutoff for Pr(Mediation); SNPs with Pr(Mediation) smaller than this cutoff will be assigned a Pr(CS) = 0 and thus not included in the credible set for the current index |
use_robust_var_est |
whether to use linear combination of median yty and individual yty. |
A table with the following columns:
SNP name.
The posterior Pr(Model) of this SNP on its absolute scale.
The posterior Pr(Model) of this SNP divided by the posterior Pr(Model) of index SNP. It should be <= 1.
If ‘Post_Model_Prob_Ratio' is greater than 1, set 'Post_Model_Prob_Ratio2' to 1. Otherwise, it’s the same as 'Post_Model_Prob_Ratio'.
The posterior mediation effect size.
The posterior Pr(Mediation) of this SNP.
If ‘Post_Med_Prob' is less than 'Pr_Med_cut', set 'Post_Med_Prob2' to 0. Otherwise, it’s the same as 'Post_Med_Prob'.
Standardized Pr(CS) where Pr(CS) = Pr(Model)*Pr(Mediation)
The cumulative 'SD_Post_CS_Prob'. Note that the table is ordered by descending 'SD_Post_CS_Prob'.
The empirical coverage of this CS (should be >= requested 'coverage').
A logical variable indicating whether this CS_SNP is included in this CS or not.
The name of the index SNP.
Jiayi Shen
fitting mJAM-Forward
mJAM_Forward( N_GWAS, X_ref, Marg_Result, EAF_Result, condp_cut = NULL, index_snps = NULL, within_pop_threshold = 0.5, across_pop_threshold = 0.2, coverage = 0.95, Pr_Med_cut = 0, filter_rare = FALSE, rare_freq = NULL, filter_unstable_est = FALSE, use_robust_var_est = FALSE )
mJAM_Forward( N_GWAS, X_ref, Marg_Result, EAF_Result, condp_cut = NULL, index_snps = NULL, within_pop_threshold = 0.5, across_pop_threshold = 0.2, coverage = 0.95, Pr_Med_cut = 0, filter_rare = FALSE, rare_freq = NULL, filter_unstable_est = FALSE, use_robust_var_est = FALSE )
N_GWAS |
A vector of sample sizes in all original GWAS studies. |
X_ref |
A list of matrices with individual-level SNP dosage data in each study/population. Each column corresponds to a SNP. Note that the columns name should match exactly to the SNP column in 'Marg_Result' and 'EAF_Result'. If certain SNP(s) is missing in dosage, then insert NAs in corresponding column(s). |
Marg_Result |
A data frame with marginal summary statistics from all studies. Col1: SNP name; Col2: Effect sizes from study #1; Col3: Std Errors of effect sizes from study #1; ... |
EAF_Result |
A data frame with effect allele frequency (EAF) from all studies. Col1: SNP name; Col2: EAF from study #1; Col3: EAF from study #2; ... |
condp_cut |
Threshold of conditional p-value to be considered as significant. No default specified. Usually recommend 5e-8. |
index_snps |
User-defined index SNP(s), if any. Default is 'NULL' which means mJAM-Forward will automatically select index variants. |
within_pop_threshold |
Threshold of r2 with selected index SNP(s) within a single population. If a SNP's correlation with any selected index SNP is greater than this threshold in at least one population, it will be excluded from subsequent rounds of index SNP selection. |
across_pop_threshold |
Threshold of r2 with selected index SNP(s) across all populations. If a SNP's correlation with any selected index SNP is greater than this threshold in all populations, it will be excluded from subsequent rounds of index SNP selection. |
coverage |
The required coverage of credible sets. Default is 0.95. |
Pr_Med_cut |
Cut off of mJAM posterior mediation probability (P(Med)) during credible set construction. Low P(Med) may indicate low correlation between the candidate SNP and the index SNP. Any candidate credible set SNPs with P(Med) < Pr_Med_cut will be not be considered for credible set. Default is 0. |
filter_rare |
A logical variable indicating whether to filter rare SNPs before the analysis. Default is 'FALSE.' If 'TRUE', then please specify 'rare_freq'. |
rare_freq |
A vector of frequencies between 0 and 0.5 to specify the minor allele frequency cut-off if you want to filter rare SNPs before the analysis. Please also set 'filter_rare' to be TRUE. For example, if there are 3 populations, then rare_freq = c(0.01, 0, 0.01) means SNPs with MAF < 0.01 in pop 1 and MAF < 0.01 in pop 3 will be removed from analysis. |
filter_unstable_est |
whether to filter variants with inconsistent estimate between mJAM and meta-analysis. |
use_robust_var_est |
whether to use the robust estimate of residual variance (weighting between median and individual estimates). |
A table listing all the selected index SNP(s) ('SNP'), along with their log10(p-value) conditional on all SNP(s) above ('cond_log10p'), the log10(p-value) conditional on all other index SNP(s) ('final_log10p'), and the p-value threshold used in this analysis ('pcut').
A table recording various posterior probabilities of all SNPs being considered for credible set SNPs.
A table with the marginal effect estimates and standard errors of all SNPs under the mJAM model.
The complete table of marginal effect estimates using fixed-effect model and mJAM model. For QC purpose only.
Jiayi Shen
Get conditional p-value under mJAM model
mJAM_get_condp( GItGI, GIty, yty, yty_med, N_GWAS, g = NULL, selected_id, use_robust_var_est = FALSE, use_median_yty_ethnic = NULL, rare_id = NULL )
mJAM_get_condp( GItGI, GIty, yty, yty_med, N_GWAS, g = NULL, selected_id, use_robust_var_est = FALSE, use_median_yty_ethnic = NULL, rare_id = NULL )
GItGI |
A list of transformed statistics from 'get_XtX()' for each study. |
GIty |
A list of transformed statistics from 'get_z()' for each study. |
yty |
A list of transformed statistics from 'get_yty()' for each study. |
yty_med |
A numeric vector of median yty across all SNPs within each study. |
N_GWAS |
A numeric vector of GWAS sample size for each study. |
g |
Hyperparameter in g-prior. If 'NULL', it will be set to 'sum(N_GWAS)'. |
selected_id |
A numeric vector of IDs of previously selected index SNP(s). |
use_robust_var_est |
whether to use linear combination of median yty and individual yty. |
use_median_yty_ethnic |
A numeric vector of study index in which median_yty is used for all SNPs in 'selected_id'. |
rare_id |
A numeric vector of IDs for rare SNP(s) which we do not apply weighting. Instead, we use the individual estimate of yty for these SNPs for robustness. |
The index of which SNP has the smallest conditional p-value.
The smallest conditional p-value.
A vector of all conditional p-values.
A vector of all conditional effect estimates.
A vector of standard errors of all the conditional effect estimates.
A complete matrix recording all conditional effect est & se for testing SNPs and 'selected_id'.
Jiayi Shen
Get conditional p-value for selected (index SNPs) under mJAM model
mJAM_get_condp_selected( GItGI, GIty, yty, yty_med, N_GWAS, g = NULL, selected_id, use_robust_var_est = FALSE, use_median_yty_ethnic = NULL, rare_SNPs = NULL )
mJAM_get_condp_selected( GItGI, GIty, yty, yty_med, N_GWAS, g = NULL, selected_id, use_robust_var_est = FALSE, use_median_yty_ethnic = NULL, rare_SNPs = NULL )
GItGI |
A list of transformed statistics from 'get_XtX()' for each study. |
GIty |
A list of transformed statistics from 'get_z()' for each study. |
yty |
A list of transformed statistics from 'get_yty()' for each study. |
yty_med |
A numeric vector of median yty across all SNPs within each study. |
N_GWAS |
A numeric vector of GWAS sample size for each study. |
g |
Hyperparameter in g-prior. If 'NULL', it will be set to 'sum(N_GWAS)'. |
selected_id |
A numeric vector of IDs of previously selected index SNP(s). |
use_robust_var_est |
whether to use linear combination of median yty and individual yty. (only for mJAM-Forward) |
use_median_yty_ethnic |
A numeric vector of study index in which median_yty is used for all SNPs in 'selected_id'. |
rare_SNPs |
A character vector for rare SNP(s) which we do not apply weighting. Instead, we use the individual estimate of yty for these SNPs for robustness. |
The estimated conditional effect size when all SNPs in 'selected_id' are in one mJAM model.
The variance of 'b_joint'.
A vector of all conditional p-values for 'b_joint'.
Jiayi Shen
Also apply weighting to get robust estimates of yty
mJAM_get_PrM( GItGI, GIty, yty, yty_med, N_GWAS, C_id, prev_X_list = NULL, g = NULL, rare_SNPs = NULL, use_robust_var_est = FALSE )
mJAM_get_PrM( GItGI, GIty, yty, yty_med, N_GWAS, C_id, prev_X_list = NULL, g = NULL, rare_SNPs = NULL, use_robust_var_est = FALSE )
GItGI |
A list of transformed statistics from 'get_XtX()' for each study. |
GIty |
A list of transformed statistics from 'get_z()' for each study. |
yty |
A list of transformed statistics from 'get_yty()' for each study. |
yty_med |
A numeric vector of median yty across all SNPs within each study. |
N_GWAS |
A numeric vector of GWAS sample size for each study. |
C_id |
An ingeter vector of IDs for the SNPs to be tested. |
prev_X_list |
A numeric vector of the ID(s) of previously selected index SNP(s). |
g |
The pre-specified 'g' in 'g'-prior formulation. |
rare_SNPs |
A numeric vector of ID(s) for rare SNP(s) which we do not apply weighting. Instead, we use the individual estimate of yty for these SNPs for robustness. |
use_robust_var_est |
whether to use linear combination of median yty and individual yty. |
Posterior Pr(Model) for each SNPs in 'C_id'.
R2 estimates of every one-SNP model (one for each SNPs in 'C_id').
An integer vector of how many studies have missing values for each SNP.
Jiayi Shen
Also apply weighting to get robust estimates of yty
mJAM_get_PrM_Wald( GItGI, GIty, yty, yty_med, N_GWAS, C_id, prev_X_list = NULL, g = NULL, rare_SNPs = NULL, use_robust_var_est = FALSE )
mJAM_get_PrM_Wald( GItGI, GIty, yty, yty_med, N_GWAS, C_id, prev_X_list = NULL, g = NULL, rare_SNPs = NULL, use_robust_var_est = FALSE )
GItGI |
A list of transformed statistics from 'get_XtX()' for each study. |
GIty |
A list of transformed statistics from 'get_z()' for each study. |
yty |
A list of transformed statistics from 'get_yty()' for each study. |
yty_med |
A numeric vector of median yty across all SNPs within each study. |
N_GWAS |
A numeric vector of GWAS sample size for each study. |
C_id |
An ingeter vector of IDs for the SNPs to be tested. |
prev_X_list |
A numeric vector of the ID(s) of previously selected index SNP(s). |
g |
The pre-specified 'g' in 'g'-prior formulation. |
rare_SNPs |
A numeric vector of ID(s) for rare SNP(s) which we do not apply weighting. Instead, we use the individual estimate of yty for these SNPs for robustness. |
use_robust_var_est |
whether to use linear combination of median yty and individual yty. |
A numeric vector of posterior Pr(Model) for each SNPs in 'C_id'.
Jiayi Shen
Also apply weighting to get robust estimates of yty
mJAM_get_PrMed( GItGI, GIty, yty, yty_med, N_GWAS, g = NULL, C_id, X_id, prev_X_list )
mJAM_get_PrMed( GItGI, GIty, yty, yty_med, N_GWAS, g = NULL, C_id, X_id, prev_X_list )
GItGI |
A list of transformed statistics from 'get_XtX()' for each study. |
GIty |
A list of transformed statistics from 'get_z()' for each study. |
yty |
A list of transformed statistics from 'get_yty()' for each study. |
yty_med |
A numeric vector of median yty across all SNPs within each study. |
N_GWAS |
A numeric vector of GWAS sample size for each study. |
g |
The pre-specified 'g' in 'g'-prior formulation. |
C_id |
An ingeter vector of IDs for the SNPs to be tested. |
X_id |
An integer specifying the ID of the index SNP. |
prev_X_list |
A numeric vector of the ID(s) of previously selected index SNP(s). |
Posterior Pr(Mediation) for each SNPs in C_id.
Posterior mediation effect size for each SNPs in C_id.
Posterior variance of mediation effect in models with both C and X.
Posterior variance of mediation effect in models with C only.
Jiayi Shen
Pruning SNPs based on LD
mJAM_LDpruning(target, testing, R, within_thre = 0.95, across_thre = 0.8)
mJAM_LDpruning(target, testing, R, within_thre = 0.95, across_thre = 0.8)
target |
Target SNP ID. |
testing |
IDs of SNPs to be tested. |
R |
a list of correlation matrix of all SNPs. |
within_thre |
threshold of r2 with selected index SNP(s) within a single population. If a SNP's correlation with any selected index SNP is greater than this threshold in at least one population, it will be excluded from subsequent rounds of index SNP selection. |
across_thre |
threshold of r2 with selected index SNP(s) across all populations. If a SNP's correlation with any selected index SNP is greater than this threshold in all populations, it will be excluded from subsequent rounds of index SNP selection. |
SNP IDs to be pruned due to high within-population correlation
SNP IDs to be pruned due to high across-population correlation
Jiayi Shen
fitting mJAM-SuSiE
mJAM_SuSiE( Marg_Result = NULL, EAF_Result = NULL, N_GWAS, X_ref, filter_rare = FALSE, rare_freq = NULL, SuSiE_num_comp = 10, SuSiE_coverage = 0.95, SuSiE_min_abs_corr = 0.5, max_iter = 500, estimate_residual_variance = F )
mJAM_SuSiE( Marg_Result = NULL, EAF_Result = NULL, N_GWAS, X_ref, filter_rare = FALSE, rare_freq = NULL, SuSiE_num_comp = 10, SuSiE_coverage = 0.95, SuSiE_min_abs_corr = 0.5, max_iter = 500, estimate_residual_variance = F )
Marg_Result |
A data frame with marginal summary statistics from all studies. Col1: SNP name; Col2: Effect sizes from study #1; Col3: Std Errors of effect sizes from study #1; ... |
EAF_Result |
A data frame with effect allele frequency (EAF) from all studies. Col1: SNP name; Col2: EAF from study #1; Col3: EAF from study #2; ... |
N_GWAS |
A vector of sample sizes in all original GWAS studies. |
X_ref |
A list of matrices with individual-level SNP dosage data in each study/population. |
filter_rare |
A logical variable indicating whether to filter rare SNPs before the analysis. Default is 'FALSE.' If 'TRUE', then please specify 'rare_freq'. |
rare_freq |
A vector of frequencies between 0 and 0.5 to specify the minor allele frequency cut-off if you want to filter rare SNPs before the analysis. Please also set 'filter_rare' to be TRUE. For example, if there are 3 populations, then rare_freq = c(0.01, 0, 0.01) means SNPs with MAF < 0.01 in pop 1 and MAF < 0.01 in pop 3 will be removed from analysis. |
SuSiE_num_comp |
SuSiE argument. The maximum number of causal SNPs that you want to select. Default is 10. |
SuSiE_coverage |
SuSiE argument. The required coverage of credible sets. Default is 0.95. |
SuSiE_min_abs_corr |
SuSiE argument. Minimum absolute correlation allowed in a credible set. |
max_iter |
SuSiE argument. Maximum iterations to perform. |
estimate_residual_variance |
SuSiE argument. If 'TRUE', then the susie algorithm is updating residual variance estimate during iterations. If 'FALSE', then use the residual variance is a fixed value, which is usually var(Y). |
A table of the SuSiE posterior inclusion probabilities (PIPs), posterior mean, and posterior sd of all SNPs.
SuSiE fit object.
Jiayi Shen
Get and tidy SuSiE credible sets
mJAM_SuSiE_get_cs(mjam_susie_res, coverage = 0.95)
mJAM_SuSiE_get_cs(mjam_susie_res, coverage = 0.95)
mjam_susie_res |
The mJAM-SuSiE result returned from 'mJAM_SuSiE()' |
coverage |
A number between 0 and 1 specifying the “coverage” of the estimated confidence sets. |
A table summary of SuSiE credible sets with the following columns:
#'
The label for a distinct credible set.
The empirical coverage of this credible set.
The number of SNPs in total in corresponding credible set.
The name of the index SNP (SNP with highest posterior probability) in corresponding credible set.
The names of individual SNPs selected in this credible set.
Jiayi Shen
Keep the output as three digits
output.format(x, ...)
output.format(x, ...)
x |
input |
... |
other options you want to put in |
Lai Jiang
Real data for selecting the metabolites for the risk of prostate cancer
The PrCa.lipids
is a set of data sets which was for selecting the metabolites for the risk of prostate cancer
The marginal matrix with 118 metabolites and 144 SNPs. This data is directly adapted from https://github.com/verena-zuber/demo_AMD (Zuber et al., 2020)
The conditional matrix with 118 metabolites and 144 SNPs, which was composed by SuSiE JAM and the marginal
matrix.
The reference genotype data for the 144 SNPs from the European-ancestry population in 1000 Genome Project (Consortium, 2015).
The b vector. The association estimates between selected SNPs and the risk of prostate cancer from (Schumacher et al., 2018)
The se(b) vector from (Schumacher et al., 2018)
The pvalues vector of the association estimates between selected SNPs and the risk of prostate cancer from (Schumacher et al., 2018)
The vector of the effect allele frequency of the SNPs from (Schumacher et al., 2018)
The RSID of the SNPs.
Consortium GP. A global reference for human genetic variation. Nature 2015; 526: 68.
Zuber, Verena, et al. Selecting likely causal risk factors from high-throughput experiments using multivariable Mendelian randomization. Nature communications 11.1 (2020): 1-11.
Schumacher, Fredrick R., et al. Association analyses of more than 140,000 men identify 63 new prostate cancer susceptibility loci. Nature genetics 50.7 (2018): 928-936.
Print out for EN-hJAM
## S3 method for class 'ENhJAM' print(x, ...)
## S3 method for class 'ENhJAM' print(x, ...)
x |
obejct output from ENhJAM |
... |
other options you want to put in |
Lai Jiang
Print out for hJAM_lnreg
## S3 method for class 'hJAM' print(x, ...)
## S3 method for class 'hJAM' print(x, ...)
x |
object output by hJAM |
... |
other options you want to put in |
Lai Jiang
Print out for hJAM_egger
## S3 method for class 'hJAM_egger' print(x, ...)
## S3 method for class 'hJAM_egger' print(x, ...)
x |
obejct output from hJAM_egger |
... |
other options you want to put in |
Lai Jiang
Print out for SHA-JAM
## S3 method for class 'SHAJAM' print(x, ...)
## S3 method for class 'SHAJAM' print(x, ...)
x |
obejct output from SHAJAM |
... |
other options you want to put in |
Lai Jiang
Function to implement SHA-JAM
SHAJAM( betas.Gy, betas_se.Gy = NULL, N.Gy, eaf.Gy = NULL, Geno, A, L.cs = NULL, min_abs_corr = NULL, coverage = 0.95, estimate_residual_variance = TRUE, max_iter = 500 )
SHAJAM( betas.Gy, betas_se.Gy = NULL, N.Gy, eaf.Gy = NULL, Geno, A, L.cs = NULL, min_abs_corr = NULL, coverage = 0.95, estimate_residual_variance = TRUE, max_iter = 500 )
betas.Gy |
The betas in the paper: the marginal effects of SNPs on the phenotype (Gy) |
betas_se.Gy |
The standard errors of the betas |
N.Gy |
The sample size of the GWAS where you obtain the betas.Gy and betas_se.Gy |
eaf.Gy |
The reference allele frequency of the SNPs in betas.Gy |
Geno |
The individual level data of the reference panel. Must have the same order of SNPs as in the betas.Gy. |
A |
The conditional A matrix. |
L.cs |
The largest number of credible set allowed in SHA-JAM. Required by SHA-JAM. |
min_abs_corr |
The requested minimum absolute correlation coefficient between intermediates within one credible set. Required by SHA-JAM. |
coverage |
The coverage of credible set. Default is 0.95. Required by SHA-JAM. |
estimate_residual_variance |
If estimate the residual variance in the fitting procedure of SHA-JAM. Default as TRUE. Required by SHA-JAM. |
max_iter |
The number of maximum iterations in fitting SHA-JAM. Required by SHA-JAM. |
An object of the SHAJAM
The number of SNPs used in the analysis.
The number of intermediates in the analysis.
The number of selected intermediates, regardless of the credible sets.
The label/name for each selected intermediates.
The coefficients of selected intermediates.
The posterior inclusion probability of each selected intermediate.
Number of credible sets.
The label/name for all candidate intermediates.
The posterior inclusion probability of all candidate intermediates.
The coefficients of all candidate intermediates.
The purity of the credibel set selected.
Lai Jiang
To generate the heatmap of all the SNPs that the user use in the analysis
SNPs_heatmap(Geno, show.variables = FALSE, x.axis.angel = 90)
SNPs_heatmap(Geno, show.variables = FALSE, x.axis.angel = 90)
Geno |
The reference panel (Geno) of the SNPs that the user use in the analysis, such as 1000 Genome |
show.variables |
Select to show the variables name or not. Default set to be FALSE. |
x.axis.angel |
The angel for displaying the X axis. Default set to be 90. |
Lai Jiang
data(MI.Rdata) SNPs_heatmap(Geno = MI.Geno[, 1: 10], show.variable = TRUE, x.axis.angel = 90)
data(MI.Rdata) SNPs_heatmap(Geno = MI.Geno[, 1: 10], show.variable = TRUE, x.axis.angel = 90)
To generate the scatter plot of the SNPs vs. one intermediate that the user use in the analysis
SNPs_scatter_plot(alphas, betas.Gy, X.label = NULL)
SNPs_scatter_plot(alphas, betas.Gy, X.label = NULL)
alphas |
The effects of SNPs on the intermediate (i.e. exposure/risk factor) (Gx). |
betas.Gy |
The betas in the paper: the marginal effects of SNPs on the phenotype (Gy) |
X.label |
The label of the intermediate (i.e. exposure/risk factor). Default is NULL. |
A set of scatter plots with x-axis being the conditional estimates for each
intermediate and y-axis being the
estimates.
Lai Jiang
data(MI) SNPs_scatter_plot(alphas = MI.Amatrix[, 1], betas.Gy = MI.betas.gwas, X.label = "BMI")
data(MI) SNPs_scatter_plot(alphas = MI.Amatrix[, 1], betas.Gy = MI.betas.gwas, X.label = "BMI")
Get SuSiE posterior mean
susie_get_posterior_mean_v2(res, prior_tol = 1e-09)
susie_get_posterior_mean_v2(res, prior_tol = 1e-09)
res |
A SuSiE fit object |
prior_tol |
When the prior variance is estimated, compare the estimated value to prior_tol at the end of the computation, and exclude a single effect from PIP computation if the estimated prior variance is smaller than this tolerance value. |
A vector of posterior mean effects
Get SuSiE posterior sd
susie_get_posterior_sd_v2(res, prior_tol = 1e-09)
susie_get_posterior_sd_v2(res, prior_tol = 1e-09)
res |
A SuSiE fit object |
prior_tol |
When the prior variance is estimated, compare the estimated value to prior_tol at the end of the computation, and exclude a single effect from PIP computation if the estimated prior variance is smaller than this tolerance value. |
A vector of posterior standard deviations
The susieJAM_A
function is to get the conditional A matrix by using marginal A matrix
susieJAM_A( marginalA, marginalA_se, N.Gx, eaf.Gy = NULL, Geno, inclusion.indicator, L.cs, min_abs_corr, max_iter, coverage, estimate_residual_variance = TRUE )
susieJAM_A( marginalA, marginalA_se, N.Gx, eaf.Gy = NULL, Geno, inclusion.indicator, L.cs, min_abs_corr, max_iter, coverage, estimate_residual_variance = TRUE )
marginalA |
the marginal effects of SNPs on the exposures (Gx). |
marginalA_se |
the standard error of the marginal effects of SNPs on the exposures (Gx). |
N.Gx |
the sample size of each Gx. It can be a scalar or a vector. If there are multiple X's from different Gx, it should be a vector including the sample size of each Gx. If all alphas are from the same Gx, it could be a scalar. |
eaf.Gy |
the effect allele frequency of the SNPs in the Gx data. |
Geno |
the reference panel (Geno), such as 1000 Genome。 |
inclusion.indicator |
The matrix of inclusion indicator of SNPs for each intermediate. Included as 1; otherwise 0. |
L.cs |
A susie input parameter. Number of components (nonzero elements) in the SuSiE regression model. If L.cs is larger than the number of covariate (p), L.cs is set to p. |
min_abs_corr |
A susie input parameter. Minimum of absolute value of correlation allowed in a credible set. The default, 0.5, corresponds to squared correlation of 0.25, which is a commonly used threshold for genotype data in genetics studies. |
max_iter |
Maximum number of iterations in SuSiE fitting. |
coverage |
Default as 0.95.The coveralge level of the credible set. |
estimate_residual_variance |
Default as TRUE. Estimate the residual variance in each iteration of SuSiE fitting. |
A matrix with conditional estimates which are converted from marginal estimates using the susie JAM model.
Lai Jiang
data(GTEx.PrCa) susieJAM_A(marginalA = GTEx.PrCa.marginal.A[, 1:9], marginalA_se = GTEx.PrCa.marginal.A.se[, 1:9], eaf.Gy = GTEx.PrCa.maf.gwas, Geno = GTEx.PrCa.Geno, inclusion.indicator = GTEx.PrCa.inclusion.indicator, N.Gx = 620, L.cs = 10, min_abs_corr = 0.5)
data(GTEx.PrCa) susieJAM_A(marginalA = GTEx.PrCa.marginal.A[, 1:9], marginalA_se = GTEx.PrCa.marginal.A.se[, 1:9], eaf.Gy = GTEx.PrCa.maf.gwas, Geno = GTEx.PrCa.Geno, inclusion.indicator = GTEx.PrCa.inclusion.indicator, N.Gx = 620, L.cs = 10, min_abs_corr = 0.5)
The susieJAM_alphas
function is to perform the variable selection and compute the selected conditional alpha vector for one intermediate.
If only one intermediate in the model, please use susieJAM_alphas instead of susieJAM_A
susieJAM_alphas( marginalA, marginalA_se, N.Gx, eaf.Gy = NULL, Geno, L.cs = 10, min_abs_corr = 0.6, max_iter = 100, coverage = 0.95, estimate_residual_variance = FALSE )
susieJAM_alphas( marginalA, marginalA_se, N.Gx, eaf.Gy = NULL, Geno, L.cs = 10, min_abs_corr = 0.6, max_iter = 100, coverage = 0.95, estimate_residual_variance = FALSE )
marginalA |
the marginal effects of SNPs on one exposure (Gx). |
marginalA_se |
the standard error of the marginal effects of SNPs on one outcome (Gx). |
N.Gx |
the sample size of the Gx. It can be a scalar. |
eaf.Gy |
The vector of the minor allele frequency or effect allele frequency in the GWAS. |
Geno |
the reference panel (Geno), such as 1000 Genome. The reference data has to be centered. |
L.cs |
A susie input parameter. Number of components (nonzero elements) in the SuSiE regression model. If L.cs is larger than the number of covariate (p), L.cs is set to p. |
min_abs_corr |
A susie input parameter. Minimum of absolute value of correlation allowed in a credible set. The default, 0.5, corresponds to squared correlation of 0.25, which is a commonly used threshold for genotype data in genetics studies. |
max_iter |
Maximum number of iterations in SuSiE fitting. |
coverage |
Default as 0.95.The coveralge level of the credible set. |
estimate_residual_variance |
Default as TRUE. Estimate the residual variance in each iteration of SuSiE fitting. |
Lai Jiang
data(GTEx.PrCa) include.SNPs = which(GTEx.PrCa.inclusion.indicator[,1]==1) susieJAM_alphas(marginalA = GTEx.PrCa.marginal.A[include.SNPs, 1], marginalA_se = GTEx.PrCa.marginal.A.se[include.SNPs, 1], eaf.Gy = GTEx.PrCa.maf.gwas[include.SNPs], Geno = GTEx.PrCa.Geno[, include.SNPs], N.Gx = 620, L.cs = 10, min_abs_corr = 0.5)
data(GTEx.PrCa) include.SNPs = which(GTEx.PrCa.inclusion.indicator[,1]==1) susieJAM_alphas(marginalA = GTEx.PrCa.marginal.A[include.SNPs, 1], marginalA_se = GTEx.PrCa.marginal.A.se[include.SNPs, 1], eaf.Gy = GTEx.PrCa.maf.gwas[include.SNPs], Geno = GTEx.PrCa.Geno[, include.SNPs], N.Gx = 620, L.cs = 10, min_abs_corr = 0.5)