Mixture Multivariate Distribution Estimate

Mixture Multivariate Distribution Estimate#

The following performs computes a prior independent of the specific analysis method chosen for the data. This foundational step enables the application of various techniques, such as UDR, ED, TED, and initialization with FLASHier, among others. The goal here is to establish a mixture model to extract meaningful signals from the data.

An earlier version of the approach is outlined in Urbut et al 2019. This workflow implements a few improvements including using additional EBMF methods as well as the new udr (Ultimate deconvolution in R) package to fit the mixture model.

After priors are calculated, the model is fit and posteriors are calculated for variables of interest, with the objective being conducting a multivariate analysis under the MASH model. The Multivariate adaptive shrinkage (MASH) analysis has improved upon the Urbut et al 2019 paper.

Input:

--data: rds file. For example, str(mwe.rds):

List of 10
 $ random.z: num [1:184, 1:6] 1.527 -0.282 0.365 -1.5 0.548 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr [1:6] "Ast" "Exc" "Inh" "Mic" ...
 $ strong.z: num [1:46, 1:6] -0.47 1.601 -1.351 -0.298 1.114 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr [1:6] "Ast" "Exc" "Inh" "Mic" ...
 $ null.z  : num [1:184, 1:6] -0.0966 -0.5293 0.854 -0.5985 -0.0601 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr [1:6] "Ast" "Exc" "Inh" "Mic" ...
 $ random.b: num [1:184, 1:6] 0.1161 -0.0232 0.0248 -0.1286 0.0417 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr [1:6] "Ast" "Exc" "Inh" "Mic" ...
 $ strong.b: num [1:46, 1:6] -0.0788 0.0824 -0.0904 -0.0548 0.1429 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr [1:6] "Ast" "Exc" "Inh" "Mic" ...
 $ null.b  : num [1:184, 1:6] -0.01337 -0.0818 0.09428 -0.04075 -0.00255 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr [1:6] "Ast" "Exc" "Inh" "Mic" ...
 $ null.s  : num [1:184, 1:6] 0.1384 0.1545 0.1104 0.0681 0.0424 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr [1:6] "Ast" "Exc" "Inh" "Mic" ...
 $ random.s: num [1:184, 1:6] 0.076 0.0822 0.068 0.0857 0.0762 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr [1:6] "Ast" "Exc" "Inh" "Mic" ...
 $ strong.s: num [1:46, 1:6] 0.1676 0.0515 0.0669 0.1837 0.1283 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr [1:6] "Ast" "Exc" "Inh" "Mic" ...
 $ XtX     : num [1:6, 1:6] 577.7 196.9 165.4 22.5 313.8 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:6] "Ast" "Exc" "Inh" "Mic" ...
  .. ..$ : chr [1:6] "Ast" "Exc" "Inh" "Mic" ...

--vhat: “identity”, “simple”, “mle”, “vhat_corshrink_xcondition”, or “vhat_simple_specific”

--cwd: output path

--vhat-data: for mash_fit.ipynb mash, vhat data in an rds file from the mixture_prior.ipynb step

--prior-data: for mash_fit.ipynb mash, prior data in an rds file from the mixture_prior.ipynb step

--compute-posterior: for mash_fit.ipynb mash, if the posterior probability should be calculated

Overview#

  1. Compute MASH prior

  2. MASH fit

  3. Generate Plots

Steps#

i. Compute MASH prior#

sos run $PATH/mixture_prior.ipynb ed_bovy \
    --output_prefix MWE_ed_bovy \
    --data $PATH/MWE.rds \
    --cwd $PATH/output/ --vhat mle

ii. MASH fit#

sos run $PATH/mash_fit.ipynb mash \
    --output-prefix MWE_ed_bovy_posterior \
    --data $PATH/MWE.rds \
    --vhat-data $PATH/MWE_ed_bovy.EZ.V_simple.rds \
    --prior-data $PATH/MWE_ed_bovy.EZ.prior.rds \
    --compute-posterior \
    --cwd $PATH/output/ \

iii. Generate Plots#

sos run $PATH/mixture_prior.ipynb plot_U \
    --output-prefix protocol_example.mixture_plots  \
    --data $PATH/MWE_ed_bovy.EZ.prior.rds \
    --cwd $PATH/output/ \

Anticipated Results#

i. Compute MASH prior
MWE_ed_bovy.EZ.prior.rds: rds file containing U, w and loglik.

MWE_ed_bovy.EZ.V_simple.rds: an NxN matrix

MWE_ed_bovy.canonical.rds: rds file containing multiple NxN matrices

MWE_ed_bovy.flash.model.rds: rds file containing a model and factors

MWE_ed_bovy.flash.rds: rds file containing three matrices:

  1. tFLASH_default

  2. FLASH_default_1

  3. FLASH_default_2

MWE_ed_bovy.flash_nonneg.model.rds: rds file containing a model and factors

MWE_ed_bovy.flash_nonneg.rds: rds file containing three matrices:

  1. tFLASH_nonneg

  2. FLASH_nonneg_1

  3. FLASH_nonneg_2

MWE_ed_bovy.pca.rds: rds file containing three matrices:

  1. PCA_1

  2. PCA_2

  3. tPCA

ii. MASH fit

MWE_ed_bovy_posterior.EZ.mash_model.rds:rds file contaning mash_model, vhat_file path and prior_file path. mash_model contains:

  1. result - contains PosteriorMean, PosteriorSD, NegativeProb, lfsr and PosteriorCov

  2. loglik

  3. vloglik

  4. null_loglik

  5. alt_loglik

  6. fitted_g - contains pi, Ulist, grid and usepointmass

  7. posterior_weights

  8. alpha

  9. lm - contains loglik_matrix and lfactors

MWE_ed_bovy_posterior.EZ.posterior.rds: rds file containint PosteriorMean, PosteriorSD, lfdr, NegativeProb and lfsr

iii. Generate Plots

MWE_ed_bovy.EZ.prior.pdf: file containing heatmap plots

MWE_ed_bovy.EZ.prior.png