hJAM is a package developed to implement the hJAM model, which is designed to estimate the associations between multiple intermediates and outcome in a Mendelian Randomization or Transcriptome analysis.
Mendelian randomization (MR) and transcriptome-wide association studies (TWAS) can be viewed as the same approach within the instrumental variable analysis framework using genetic variants. They differ in their intermediates: MR focuses on modifible risk factors while TWAS focuses on gene expressions. We can use a two-stage hierarchical model to unify the framework of MR and TWAS. Details are described in our paper.
We have two methods in our pacakge:
hJAM
hJAM_egger
The input of the two hJAM model includes:
betas.Gy
is the beta vector of the marginal effects of
SNPs on the outcome. It can be directly extracted from a GWAS where the
outcome is the outcome of interests.N.Gy
is the sample size of the GWAS where the users
extract the betas.Gy
.Gl
is the reference genotype matrix with number of
columns equals to the number of SNPs in the instrument set. It can be a
publicly available genotype data, such as 1000 Genomes
project. Users need to confirm the genotype matrix is in a dosage
format. For VCF files, users could use vcftool to convert it into
dosage format.A
is the alpha matrix of the conditional effects of
SNPs on the intermediates. The number of columns of A
equals the number of intermediates in the users’ research question. To
conditional A
matrix can be converted from a marginal
A
by using get_cond_A
function in hJAM
package. We will describe this later.ridgeTerm
is the ridge term that we add to the diagonal
of Cholesky decomposition component matrix L to enforce it to be a positive
definite matrix. Please see details in our paper.To generate a conditional estimate  matrix from a marginal estimate
 matrix, users can use the
get_cond_A
(if number of intermediates > 1) or
get_cond_alpha
(if number of intermediate = 1) functions in
hJAM package. Examples are given in next section.
For MR questions, the intermediates are modifiable risk factors. The marginal  can be extracted from different GWAS whose the outcomes are the risk factors of interests. For example, for intermediate as body mass index, the marginal α̂ vector can be extracted from the GIANT consortium.
For TWAS questions, the intermediates are gene expressions. There are two ways to obtain the elements in  matrix.
GTEx portal: the GTEx project provides marginal summary statistics between SNPs and gene expressions in different tissues.
PredictDB: the PredictDB is developed by the PrediXcan group. It uses elastic net on individual level data from the GTEx project.
Implementation with caution:
hJAM_egger
function, make
sure that the directions of the association estimates in  matrix are positive. It is
possible that there are some of SNPs cannot be positive due to the
reverse effects between intermediates on the outcome.In our package, we prepared a data example which we have described in detail in our paper. In this data example, we focus on the conditional effects of body mass index (BMI) and type 2 diabetes (T2D) on myocardial infarction (MI).
We identified 75 and 136 significantly BMI- and T2D-associated SNPs from GIANT consortium and DIAGRAM+GERA+UKB, respectively. In this set of SNPs, there was one overlapping SNP in both the instrument sets for BMI and T2D. In total, we have 210 SNPs identified. The association estimates between the 210 SNPs and MI were collected from UK Biobank.
A quick look at the data in the example -
## Warning in data("MI.Rdata"): data set 'MI.Rdata' not found
In this package, we embed two fucntions for the users to check the SNPs they use in the analysis visually:
SNPs_scatter_plot
SNPs_heatmap
You could use function get_cond_A
function to run JAM on
the marginal estimates Â
matrix and convert it into a conditional estimates  matrix.
## Warning in data(MI.Rdata): data set 'MI.Rdata' not found
cond_A = JAM_A(marginalA = MI.marginal.Amatrix, Geno = MI.Geno, N.Gx = c(339224, 659316), ridgeTerm = TRUE)
cond_A[1:10, ]
## bmi t2d
## [1,] 0.019531085 0.072587479
## [2,] 0.025262061 0.013586305
## [3,] -0.005147363 0.089673869
## [4,] 0.046302578 0.041313424
## [5,] 0.016849395 -0.004564612
## [6,] 0.012345062 0.036207992
## [7,] 0.042065546 0.013786564
## [8,] 0.073772221 0.118579096
## [9,] -0.004559686 0.051510597
## [10,] -0.012843666 0.091789312
The default version of hJAM restricts the intercept to be zero.
hJAM::hJAM(betas.Gy = MI.betas.gwas, Geno = MI.Geno, N.Gy = 459324, A = MI.Amatrix, ridgeTerm = TRUE) # 459324 is the sample size of the UK Biobank GWAS of MI
## ------------------------------------------------------
## hJAM output
## ------------------------------------------------------
## Number of SNPs used in model: 210
##
## Estimate StdErr 95% CI Pvalue
## bmi 0.322 0.061 (0.202, 0.441) 1.268210e-07
## t2d 0.119 0.017 (0.086, 0.153) 3.176604e-12
## ------------------------------------------------------
Another method in this package is hJAM with Egger regression, which is analogus to MR egger. It allows the intercept to be non-zero.
hJAM::hJAM_egger(betas.Gy = MI.betas.gwas, Geno = MI.Geno, N.Gy = 459324, A = MI.Amatrix, ridgeTerm = TRUE) # 459324 is the sample size of the UK Biobank GWAS of MI
## ------------------------------------------------------
## hJAM egger output
## ------------------------------------------------------
## Number of SNPs used in model: 210
##
## Estimate StdErr 95% CI Pvalue
## bmi 0.262 0.071 (0.123, 0.401) 0
## t2d 0.089 0.025 (0.039, 0.139) 0
## Intercept 0.003 0.002 (-0.001, 0.007) 0.107
## ------------------------------------------------------
We presented the main usage of hJAM
package. For more
details about each function, please go check the package documentation.
If you would like to give us feedback or report issue, please tell us on
Github.